Initial processing

This document describes the analysis of TCR repertoires for the manuscript “Unique roles of coreceptor-bound LCK in helper and cytotoxic T cells” by Horkova et al. The analysis workflow included:

  • sorting of CD4 and CD8 cells from lymph nodes and thymi of LckWT, LckCA and LckCA/KR mice,
  • isolation of RNA using RNA Clean & Concentrator-5 kit
  • preparation of TCR-enriched libraries using the NEBNext® Mouse Immune Sequencing Kit
  • sequencing the libraries on Illumina Miseq, 300bp paired-end reads
  • demultiplexing reads in Illumina BaseSpace
  • processing fastq files using the recommended pRESTO workflow on Galaxy with default parameters
  • aligning processed fastq files using MiXCR

Raw data are deposited in the Sequence Read Archive (PRJNA872031). Processed files can be downloaded download from Zenodo. The Zenodo archive contains the following files, which are needed to run this script:

  • merged outputs from MiXCR merged_TCR_repertoires.csv
  • metadata file metadata_Lck.csv
  • count tables with raw counts of CDR3 TRA and TRB sequences in each sample count_table_tra.csv, "count_table_trb.csv"
  • TRA and TRB repertoires prepared for processing with the Immunarch package immdata_tra.rds, immdata_trb.rds

Let’s read the required files:

#
merged_repertoires <- read.csv("merged_TCR_repertoires.csv")
md <- read.csv("metadata_Lck.csv")

counts_tra <- read_csv("count_table_tra.csv")
counts_tra$...1 <- NULL

counts_trb <- read_csv("count_table_trb.csv")
counts_trb$...1 <- NULL

Count tables

We will start with processing of the file merged_TCR_repertoires.csv. This file contains outputs of MiXCR software, i.e., all the assembled CDR3 clonotypes, their counts in each sample, nucleotide and amino-acid sequences, all V, D and J alignments with scores, together with some meta-data. the column num_id contains info about the sample number, which we will frequently use to attach meta-data to sample numbers.

Here, we summarize the variables in the file merged_TCR_repertoires.csv:

ExpData(data=merged_repertoires, type=2)
##    Index     Variable_Name Variable_Type Sample_n Missing_Count Per_of_Missing
## 1      1           cloneId       integer  1001983             0          0.000
## 2      2        cloneCount       integer  1001983             0          0.000
## 3      3     cloneFraction       numeric  1001983             0          0.000
## 4      4   targetSequences     character  1001983             0          0.000
## 5      5   targetQualities     character  1001983             0          0.000
## 6      6 allVHitsWithScore     character  1001983             0          0.000
## 7      7 allDHitsWithScore     character   566670        435313          0.434
## 8      8 allJHitsWithScore     character  1001983             0          0.000
## 9      9 allCHitsWithScore     character      955       1001028          0.999
## 10    10    allVAlignments     character  1001983             0          0.000
## 11    11    allDAlignments     character   566670        435313          0.434
## 12    12    allJAlignments     character  1001983             0          0.000
## 13    13    allCAlignments     character      144       1001839          1.000
## 14    14          nSeqCDR3     character  1001983             0          0.000
## 15    15       minQualCDR3       integer  1001983             0          0.000
## 16    16           nSeqFR4       logical        0       1001983          1.000
## 17    17         aaSeqCDR3     character  1001983             0          0.000
## 18    18            num_id       integer  1001983             0          0.000
## 19    19             chain     character  1001983             0          0.000
## 20    20      Mouse_strain     character  1001983             0          0.000
## 21    21         Cell_type     character  1001983             0          0.000
## 22    22             Organ     character  1001983             0          0.000
## 23    23        Cell_count       numeric  1001983             0          0.000
## 24    24               Exp       integer  1001983             0          0.000
## 25    25          INDEX_i7       integer  1001983             0          0.000
## 26    26          INDEX_i5       integer  1001983             0          0.000
##    No_of_distinct_values
## 1                  68138
## 2                    120
## 3                    994
## 4                 823904
## 5                 746125
## 6                  93452
## 7                    327
## 8                   6929
## 9                    122
## 10                 14932
## 11                 13102
## 12                 17988
## 13                     1
## 14                823904
## 15                    70
## 16                     0
## 17                541554
## 18                    28
## 19                     2
## 20                     3
## 21                     2
## 22                     2
## 23                    28
## 24                     3
## 25                    12
## 26                     3

First, we will generate summary count tables for TRA and TRB CDR3 sequences (these tables are provided as Supplementary table S1 in the manuscript). We will use pivot_wider function and we will change the num_id of the samples to meaningful names.

Here, we show the resulting table for TRA:

count_table_tra <- merged_repertoires %>% 
  dplyr::filter(chain == "TRA") %>% 
  dplyr::select(aaSeqCDR3, num_id, cloneCount)

count_table_tra2 <- count_table_tra %>% 
  pivot_wider(names_from = num_id, values_from = cloneCount, values_fill = 0, values_fn = sum)

md2_tra <- merged_repertoires %>% filter(chain == "TRA") %>% group_by(aaSeqCDR3) %>% slice_head(n = 1)

excel_count_table_tra <- counts_tra %>% left_join(md2_tra %>% select(allVHitsWithScore, allDHitsWithScore, allJHitsWithScore))

md_to_join <- md %>% mutate(new_name = paste(Cell_type, Organ, Mouse_strain, paste0("Exp0",Exp)))  %>% select(num_id, new_name) %>% arrange(num_id)

order <- match(as.numeric(colnames(counts_tra)[2:29]), md_to_join$num_id)

colnames(excel_count_table_tra)[2:29] <- pull(md_to_join, new_name)[order]
excel_count_table_tra2 <- excel_count_table_tra %>% dplyr::select(1, 31, 32, 30, 22, 12, 25, 11, 6, 26, 13, 18, 4, 28, 17, 14, 8, 23, 9, 7, 16, 24, 2, 21, 5, 29, 10, 20, 19, 3, 15, 27)

excel_count_table_tra2 %>% head
## # A tibble: 6 × 32
##   aaSeqCDR3      allDH…¹ allJH…² allVH…³ CD4 L…⁴ CD4 L…⁵ CD4 L…⁶ CD4 L…⁷ CD4 L…⁸
##   <chr>          <chr>   <chr>   <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 CAASASSGSWQLIF <NA>    TRAJ22… TRAV14…      38      24      73      30      43
## 2 CAASNMGYKLTF   <NA>    TRAJ9*… TRAV7-…      16      11      21      15      14
## 3 CAVSASSGSWQLIF <NA>    TRAJ22… TRAV3D…      10       5       6       5       9
## 4 CAASDNYAQGLTF  <NA>    TRAJ26… TRAV14…       4       0       6       8       3
## 5 CALSDRYNQGKLIF <NA>    TRAJ23… TRAV12…       1       1       8       0       1
## 6 CAASDDTNAYKVIF <NA>    TRAJ30… TRAV14…       0       0       1       4       2
## # … with 23 more variables: `CD4 Lymph nodes WT Exp02` <dbl>,
## #   `CD4 Lymph nodes WT Exp03` <dbl>, `CD4 Thymus CA Exp01` <dbl>,
## #   `CD4 Thymus CA Exp03` <dbl>, `CD4 Thymus CAKR Exp01` <dbl>,
## #   `CD4 Thymus CAKR Exp02` <dbl>, `CD4 Thymus CAKR Exp03` <dbl>,
## #   `CD4 Thymus WT Exp02` <dbl>, `CD4 Thymus WT Exp03` <dbl>,
## #   `CD8 Lymph nodes CA Exp02` <dbl>, `CD8 Lymph nodes CA Exp03` <dbl>,
## #   `CD8 Lymph nodes CAKR Exp02` <dbl>, `CD8 Lymph nodes CAKR Exp03` <dbl>, …
ExpData(data=excel_count_table_tra2, type=2)
##    Index              Variable_Name Variable_Type Sample_n Missing_Count
## 1      1                  aaSeqCDR3     character   114760             0
## 2      2          allDHitsWithScore     character     4769        109991
## 3      3          allJHitsWithScore     character   114760             0
## 4      4          allVHitsWithScore     character   114760             0
## 5      5   CD4 Lymph nodes CA Exp01       numeric   114760             0
## 6      6   CD4 Lymph nodes CA Exp02       numeric   114760             0
## 7      7   CD4 Lymph nodes CA Exp03       numeric   114760             0
## 8      8 CD4 Lymph nodes CAKR Exp02       numeric   114760             0
## 9      9 CD4 Lymph nodes CAKR Exp03       numeric   114760             0
## 10    10   CD4 Lymph nodes WT Exp02       numeric   114760             0
## 11    11   CD4 Lymph nodes WT Exp03       numeric   114760             0
## 12    12        CD4 Thymus CA Exp01       numeric   114760             0
## 13    13        CD4 Thymus CA Exp03       numeric   114760             0
## 14    14      CD4 Thymus CAKR Exp01       numeric   114760             0
## 15    15      CD4 Thymus CAKR Exp02       numeric   114760             0
## 16    16      CD4 Thymus CAKR Exp03       numeric   114760             0
## 17    17        CD4 Thymus WT Exp02       numeric   114760             0
## 18    18        CD4 Thymus WT Exp03       numeric   114760             0
## 19    19   CD8 Lymph nodes CA Exp02       numeric   114760             0
## 20    20   CD8 Lymph nodes CA Exp03       numeric   114760             0
## 21    21 CD8 Lymph nodes CAKR Exp02       numeric   114760             0
## 22    22 CD8 Lymph nodes CAKR Exp03       numeric   114760             0
## 23    23   CD8 Lymph nodes WT Exp01       numeric   114760             0
## 24    24   CD8 Lymph nodes WT Exp02       numeric   114760             0
## 25    25   CD8 Lymph nodes WT Exp03       numeric   114760             0
## 26    26        CD8 Thymus CA Exp01       numeric   114760             0
## 27    27        CD8 Thymus CA Exp03       numeric   114760             0
## 28    28      CD8 Thymus CAKR Exp02       numeric   114760             0
## 29    29      CD8 Thymus CAKR Exp03       numeric   114760             0
## 30    30        CD8 Thymus WT Exp01       numeric   114760             0
## 31    31        CD8 Thymus WT Exp02       numeric   114760             0
## 32    32        CD8 Thymus WT Exp03       numeric   114760             0
##    Per_of_Missing No_of_distinct_values
## 1           0.000                114760
## 2           0.958                   110
## 3           0.000                  2980
## 4           0.000                 31200
## 5           0.000                    28
## 6           0.000                    22
## 7           0.000                    37
## 8           0.000                    26
## 9           0.000                    35
## 10          0.000                    29
## 11          0.000                    35
## 12          0.000                    15
## 13          0.000                    32
## 14          0.000                    14
## 15          0.000                    14
## 16          0.000                    21
## 17          0.000                    31
## 18          0.000                    27
## 19          0.000                    21
## 20          0.000                    23
## 21          0.000                    17
## 22          0.000                    25
## 23          0.000                    26
## 24          0.000                    31
## 25          0.000                    30
## 26          0.000                     9
## 27          0.000                    23
## 28          0.000                    10
## 29          0.000                    11
## 30          0.000                    22
## 31          0.000                     9
## 32          0.000                    30

Here, we show the resulting table for TRB:

count_table_trb <- merged_repertoires %>% 
  dplyr::filter(chain == "TRB") %>% 
  dplyr::select(aaSeqCDR3, num_id, cloneCount)

count_table_trb2 <- count_table_trb %>% 
  pivot_wider(names_from = num_id, values_from = cloneCount, values_fill = 0, values_fn = sum)

md2_trb <- merged_repertoires %>% 
  filter(chain == "TRB") %>% 
  group_by(aaSeqCDR3) %>% 
  slice_head(n = 1)

excel_count_table_trb <- counts_trb %>% 
  left_join(md2_trb %>% 
              select(allVHitsWithScore, allDHitsWithScore, allJHitsWithScore))

md_to_join <- md %>% 
  mutate(new_name = paste(Cell_type, Organ, Mouse_strain, paste0("Exp0",Exp))) %>% 
  select(num_id, new_name) %>% 
  arrange(num_id)

order <- match(as.numeric(colnames(counts_trb)[2:29]), md_to_join$num_id)

colnames(excel_count_table_trb)[2:29] <- pull(md_to_join, new_name)[order]

excel_count_table_trb2 <- excel_count_table_trb %>% 
  dplyr::select(1, 30:32, 22, 12, 25, 11, 6, 26, 13, 18, 4, 28, 17, 14, 8, 23, 9, 7, 16, 24, 2, 21, 5, 29, 10, 20, 19, 3, 15, 27)

excel_count_table_trb2 %>% head
## # A tibble: 6 × 32
##   aaSeqCDR3      allVH…¹ allDH…² allJH…³ CD4 L…⁴ CD4 L…⁵ CD4 L…⁶ CD4 L…⁷ CD4 L…⁸
##   <chr>          <chr>   <chr>   <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 CASSDSAETLYF   TRBV13… <NA>    TRBJ2-…       7       1       6       1      12
## 2 CASSDAEQFF     TRBV13… <NA>    TRBJ2-…       0       0       5       0       1
## 3 CASSDAGYEQYF   TRBV13… <NA>    TRBJ2-…       0       1       2       0       3
## 4 CASSDWGGYAEQFF TRBV13… TRBD2*… TRBJ2-…       3       1       1       4       0
## 5 CASSDSGGQDTQYF TRBV13… TRBD2*… TRBJ2-…       1       0       2       1       0
## 6 CASRDRNTEVFF   TRBV13… TRBD1*… TRBJ1-…       6       1       3       3       0
## # … with 23 more variables: `CD4 Lymph nodes WT Exp02` <dbl>,
## #   `CD4 Lymph nodes WT Exp03` <dbl>, `CD4 Thymus CA Exp01` <dbl>,
## #   `CD4 Thymus CA Exp03` <dbl>, `CD4 Thymus CAKR Exp01` <dbl>,
## #   `CD4 Thymus CAKR Exp02` <dbl>, `CD4 Thymus CAKR Exp03` <dbl>,
## #   `CD4 Thymus WT Exp02` <dbl>, `CD4 Thymus WT Exp03` <dbl>,
## #   `CD8 Lymph nodes CA Exp02` <dbl>, `CD8 Lymph nodes CA Exp03` <dbl>,
## #   `CD8 Lymph nodes CAKR Exp02` <dbl>, `CD8 Lymph nodes CAKR Exp03` <dbl>, …
ExpData(data=excel_count_table_trb2, type=2)
##    Index              Variable_Name Variable_Type Sample_n Missing_Count
## 1      1                  aaSeqCDR3     character   426800             0
## 2      2          allVHitsWithScore     character   426800             0
## 3      3          allDHitsWithScore     character   331672         95128
## 4      4          allJHitsWithScore     character   426800             0
## 5      5   CD4 Lymph nodes CA Exp01       numeric   426800             0
## 6      6   CD4 Lymph nodes CA Exp02       numeric   426800             0
## 7      7   CD4 Lymph nodes CA Exp03       numeric   426800             0
## 8      8 CD4 Lymph nodes CAKR Exp02       numeric   426800             0
## 9      9 CD4 Lymph nodes CAKR Exp03       numeric   426800             0
## 10    10   CD4 Lymph nodes WT Exp02       numeric   426800             0
## 11    11   CD4 Lymph nodes WT Exp03       numeric   426800             0
## 12    12        CD4 Thymus CA Exp01       numeric   426800             0
## 13    13        CD4 Thymus CA Exp03       numeric   426800             0
## 14    14      CD4 Thymus CAKR Exp01       numeric   426800             0
## 15    15      CD4 Thymus CAKR Exp02       numeric   426800             0
## 16    16      CD4 Thymus CAKR Exp03       numeric   426800             0
## 17    17        CD4 Thymus WT Exp02       numeric   426800             0
## 18    18        CD4 Thymus WT Exp03       numeric   426800             0
## 19    19   CD8 Lymph nodes CA Exp02       numeric   426800             0
## 20    20   CD8 Lymph nodes CA Exp03       numeric   426800             0
## 21    21 CD8 Lymph nodes CAKR Exp02       numeric   426800             0
## 22    22 CD8 Lymph nodes CAKR Exp03       numeric   426800             0
## 23    23   CD8 Lymph nodes WT Exp01       numeric   426800             0
## 24    24   CD8 Lymph nodes WT Exp02       numeric   426800             0
## 25    25   CD8 Lymph nodes WT Exp03       numeric   426800             0
## 26    26        CD8 Thymus CA Exp01       numeric   426800             0
## 27    27        CD8 Thymus CA Exp03       numeric   426800             0
## 28    28      CD8 Thymus CAKR Exp02       numeric   426800             0
## 29    29      CD8 Thymus CAKR Exp03       numeric   426800             0
## 30    30        CD8 Thymus WT Exp01       numeric   426800             0
## 31    31        CD8 Thymus WT Exp02       numeric   426800             0
## 32    32        CD8 Thymus WT Exp03       numeric   426800             0
##    Per_of_Missing No_of_distinct_values
## 1           0.000                426800
## 2           0.000                 23763
## 3           0.223                   214
## 4           0.000                  2168
## 5           0.000                    29
## 6           0.000                    21
## 7           0.000                    33
## 8           0.000                    20
## 9           0.000                    38
## 10          0.000                    30
## 11          0.000                    25
## 12          0.000                    24
## 13          0.000                    42
## 14          0.000                    19
## 15          0.000                    14
## 16          0.000                    39
## 17          0.000                    17
## 18          0.000                    15
## 19          0.000                    24
## 20          0.000                    25
## 21          0.000                    24
## 22          0.000                    32
## 23          0.000                    22
## 24          0.000                    32
## 25          0.000                    23
## 26          0.000                     9
## 27          0.000                    20
## 28          0.000                    14
## 29          0.000                    14
## 30          0.000                    16
## 31          0.000                     9
## 32          0.000                    20

Prop tables and less than 5

Prop table TRA

excel_count_table_tra3 <- excel_count_table_tra2 %>% mutate(nkt_trav11_traj18 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRAV11") & (grepl(allJHitsWithScore, pattern = "TRAJ18"))),"yes","no"))

excel_count_table_tra4 <- excel_count_table_tra3 %>% filter(nkt_trav11_traj18 == "no")

count_table_tra4 <- as.matrix(excel_count_table_tra4[,5:32])
rownames(count_table_tra4) <- excel_count_table_tra4$aaSeqCDR3

tra4_norm <- scale(count_table_tra4, center=FALSE, scale=colSums(count_table_tra4))

prop.table.tra <- cbind(tra4_norm, excel_count_table_tra4 %>% select(-starts_with("CD")) ) 

prop.table.tra2 <- prop.table.tra %>% dplyr::select(29, 31, 30, 32, 1:28)

prop.table.tra2 %>% head
##                     aaSeqCDR3 allJHitsWithScore allDHitsWithScore
## CAASASSGSWQLIF CAASASSGSWQLIF  TRAJ22*00(303.6)              <NA>
## CAASNMGYKLTF     CAASNMGYKLTF   TRAJ9*00(284.4)              <NA>
## CAVSASSGSWQLIF CAVSASSGSWQLIF  TRAJ22*00(304.2)              <NA>
## CAASDNYAQGLTF   CAASDNYAQGLTF    TRAJ26*00(284)              <NA>
## CALSDRYNQGKLIF CALSDRYNQGKLIF  TRAJ23*00(272.3)              <NA>
## CAASDDTNAYKVIF CAASDDTNAYKVIF  TRAJ30*00(295.7)              <NA>
##                                                                                                   allVHitsWithScore
## CAASASSGSWQLIF TRAV14N-2*00(1026.1),TRAV14D-2*00(1017.8),TRAV14D-1*00(973.4),TRAV14N-1*00(973.4),TRAV14-2*00(944.4)
## CAASNMGYKLTF                                                                   TRAV7-2*00(731.5),TRAV7D-2*00(731.5)
## CAVSASSGSWQLIF                                              TRAV3D-3*00(975.2),TRAV3N-3*00(975.2),TRAV3-3*00(944.1)
## CAASDNYAQGLTF                       TRAV14D-1*00(878.2),TRAV14N-1*00(878.2),TRAV14D-2*00(848.2),TRAV14N-2*00(848.2)
## CALSDRYNQGKLIF                                                            TRAV12D-3*00(1377.3),TRAV12N-3*00(1377.3)
## CAASDDTNAYKVIF                                                                                   TRAV14-1*00(966.7)
##                CD4 Lymph nodes CA Exp01 CD4 Lymph nodes CA Exp02
## CAASASSGSWQLIF             2.727925e-03             0.0035273369
## CAASNMGYKLTF               1.148600e-03             0.0016166961
## CAVSASSGSWQLIF             7.178751e-04             0.0007348618
## CAASDNYAQGLTF              2.871500e-04             0.0000000000
## CALSDRYNQGKLIF             7.178751e-05             0.0001469724
## CAASDDTNAYKVIF             0.000000e+00             0.0000000000
##                CD4 Lymph nodes CA Exp03 CD4 Lymph nodes CAKR Exp02
## CAASASSGSWQLIF             3.841094e-03               0.0027958993
## CAASNMGYKLTF               1.104972e-03               0.0013979497
## CAVSASSGSWQLIF             3.157064e-04               0.0004659832
## CAASDNYAQGLTF              3.157064e-04               0.0007455732
## CALSDRYNQGKLIF             4.209419e-04               0.0000000000
## CAASDDTNAYKVIF             5.261773e-05               0.0003727866
##                CD4 Lymph nodes CAKR Exp03 CD4 Lymph nodes WT Exp02
## CAASASSGSWQLIF               2.668321e-03             0.0035501049
## CAASNMGYKLTF                 8.687558e-04             0.0008068420
## CAVSASSGSWQLIF               5.584859e-04             0.0002420526
## CAASDNYAQGLTF                1.861620e-04             0.0001613684
## CALSDRYNQGKLIF               6.205399e-05             0.0000806842
## CAASDDTNAYKVIF               1.241080e-04             0.0000000000
##                CD4 Lymph nodes WT Exp03 CD4 Thymus CA Exp01 CD4 Thymus CA Exp03
## CAASASSGSWQLIF             3.291020e-03        0.0028328612        0.0022079929
## CAASNMGYKLTF               1.097007e-03        0.0012876642        0.0013247958
## CAVSASSGSWQLIF             2.611921e-04        0.0005150657        0.0003679988
## CAASDNYAQGLTF              1.567152e-04        0.0000000000        0.0003679988
## CALSDRYNQGKLIF             5.223842e-05        0.0000000000        0.0000000000
## CAASDDTNAYKVIF             1.567152e-04        0.0002575328        0.0000000000
##                CD4 Thymus CAKR Exp01 CD4 Thymus CAKR Exp02
## CAASASSGSWQLIF          0.0013199578          0.0037400655
## CAASNMGYKLTF            0.0013199578          0.0000000000
## CAVSASSGSWQLIF          0.0000000000          0.0023375409
## CAASDNYAQGLTF           0.0005279831          0.0000000000
## CALSDRYNQGKLIF          0.0000000000          0.0000000000
## CAASDDTNAYKVIF          0.0000000000          0.0004675082
##                CD4 Thymus CAKR Exp03 CD4 Thymus WT Exp02 CD4 Thymus WT Exp03
## CAASASSGSWQLIF          0.0033652602        0.0025633010        2.775089e-03
## CAASNMGYKLTF            0.0001294331        0.0013754298        5.808325e-04
## CAVSASSGSWQLIF          0.0005177323        0.0005001563        3.872217e-04
## CAASDNYAQGLTF           0.0000000000        0.0003125977        3.872217e-04
## CALSDRYNQGKLIF          0.0001294331        0.0000000000        6.453695e-05
## CAASDDTNAYKVIF          0.0000000000        0.0003751172        1.936108e-04
##                CD8 Lymph nodes CA Exp02 CD8 Lymph nodes CA Exp03
## CAASASSGSWQLIF             0.0099369387             0.0093881515
## CAASNMGYKLTF               0.0018154023             0.0020233085
## CAVSASSGSWQLIF             0.0021975922             0.0018614438
## CAASDNYAQGLTF              0.0009554749             0.0009711881
## CALSDRYNQGKLIF             0.0011465698             0.0008902557
## CAASDDTNAYKVIF             0.0012421173             0.0008093234
##                CD8 Lymph nodes CAKR Exp02 CD8 Lymph nodes CAKR Exp03
## CAASASSGSWQLIF               0.0080917874               0.0096842588
## CAASNMGYKLTF                 0.0019323671               0.0022348289
## CAVSASSGSWQLIF               0.0008454106               0.0020056157
## CAASDNYAQGLTF                0.0007246377               0.0009741562
## CALSDRYNQGKLIF               0.0016908213               0.0015471893
## CAASDDTNAYKVIF               0.0006038647               0.0004011231
##                CD8 Lymph nodes WT Exp01 CD8 Lymph nodes WT Exp02
## CAASASSGSWQLIF              0.009602698              0.011932808
## CAASNMGYKLTF                0.002492303              0.001946123
## CAVSASSGSWQLIF              0.001905879              0.001894909
## CAASDNYAQGLTF               0.001612667              0.001177917
## CALSDRYNQGKLIF              0.003665152              0.003021612
## CAASDDTNAYKVIF              0.001319455              0.000870634
##                CD8 Lymph nodes WT Exp03 CD8 Thymus CA Exp01 CD8 Thymus CA Exp03
## CAASASSGSWQLIF             0.0097454014        0.0062111801        0.0073130455
## CAASNMGYKLTF               0.0021318066        0.0014614541        0.0025163167
## CAVSASSGSWQLIF             0.0019490803        0.0014614541        0.0008649839
## CAASDNYAQGLTF              0.0012790839        0.0010960906        0.0004718094
## CALSDRYNQGKLIF             0.0024363503        0.0003653635        0.0005504443
## CAASDDTNAYKVIF             0.0007918139        0.0007307271        0.0009436188
##                CD8 Thymus CAKR Exp02 CD8 Thymus CAKR Exp03 CD8 Thymus WT Exp01
## CAASASSGSWQLIF          0.0084643289          0.0065963061        0.0088170463
## CAASNMGYKLTF            0.0008061266          0.0015077271        0.0020992967
## CAVSASSGSWQLIF          0.0016122531          0.0009423294        0.0019943319
## CAASDNYAQGLTF           0.0012091898          0.0011307953        0.0015744726
## CALSDRYNQGKLIF          0.0004030633          0.0005653977        0.0016794374
## CAASDDTNAYKVIF          0.0016122531          0.0016961930        0.0007347539
##                CD8 Thymus WT Exp02 CD8 Thymus WT Exp03
## CAASASSGSWQLIF        0.0053846154         0.007650815
## CAASNMGYKLTF          0.0015384615         0.001761339
## CAVSASSGSWQLIF        0.0015384615         0.001541171
## CAASDNYAQGLTF         0.0023076923         0.001431088
## CALSDRYNQGKLIF        0.0019230769         0.001155878
## CAASDDTNAYKVIF        0.0003846154         0.001265962
ExpData(data=prop.table.tra2, type=2)
##    Index              Variable_Name Variable_Type Sample_n Missing_Count
## 1      1                  aaSeqCDR3     character   114630             0
## 2      2          allJHitsWithScore     character   114630             0
## 3      3          allDHitsWithScore     character     4769        109861
## 4      4          allVHitsWithScore     character   114630             0
## 5      5   CD4 Lymph nodes CA Exp01       numeric   114630             0
## 6      6   CD4 Lymph nodes CA Exp02       numeric   114630             0
## 7      7   CD4 Lymph nodes CA Exp03       numeric   114630             0
## 8      8 CD4 Lymph nodes CAKR Exp02       numeric   114630             0
## 9      9 CD4 Lymph nodes CAKR Exp03       numeric   114630             0
## 10    10   CD4 Lymph nodes WT Exp02       numeric   114630             0
## 11    11   CD4 Lymph nodes WT Exp03       numeric   114630             0
## 12    12        CD4 Thymus CA Exp01       numeric   114630             0
## 13    13        CD4 Thymus CA Exp03       numeric   114630             0
## 14    14      CD4 Thymus CAKR Exp01       numeric   114630             0
## 15    15      CD4 Thymus CAKR Exp02       numeric   114630             0
## 16    16      CD4 Thymus CAKR Exp03       numeric   114630             0
## 17    17        CD4 Thymus WT Exp02       numeric   114630             0
## 18    18        CD4 Thymus WT Exp03       numeric   114630             0
## 19    19   CD8 Lymph nodes CA Exp02       numeric   114630             0
## 20    20   CD8 Lymph nodes CA Exp03       numeric   114630             0
## 21    21 CD8 Lymph nodes CAKR Exp02       numeric   114630             0
## 22    22 CD8 Lymph nodes CAKR Exp03       numeric   114630             0
## 23    23   CD8 Lymph nodes WT Exp01       numeric   114630             0
## 24    24   CD8 Lymph nodes WT Exp02       numeric   114630             0
## 25    25   CD8 Lymph nodes WT Exp03       numeric   114630             0
## 26    26        CD8 Thymus CA Exp01       numeric   114630             0
## 27    27        CD8 Thymus CA Exp03       numeric   114630             0
## 28    28      CD8 Thymus CAKR Exp02       numeric   114630             0
## 29    29      CD8 Thymus CAKR Exp03       numeric   114630             0
## 30    30        CD8 Thymus WT Exp01       numeric   114630             0
## 31    31        CD8 Thymus WT Exp02       numeric   114630             0
## 32    32        CD8 Thymus WT Exp03       numeric   114630             0
##    Per_of_Missing No_of_distinct_values
## 1           0.000                114630
## 2           0.000                  2972
## 3           0.958                   110
## 4           0.000                 31170
## 5           0.000                    27
## 6           0.000                    21
## 7           0.000                    36
## 8           0.000                    25
## 9           0.000                    34
## 10          0.000                    28
## 11          0.000                    34
## 12          0.000                    12
## 13          0.000                    29
## 14          0.000                    11
## 15          0.000                    10
## 16          0.000                    18
## 17          0.000                    28
## 18          0.000                    25
## 19          0.000                    21
## 20          0.000                    23
## 21          0.000                    17
## 22          0.000                    25
## 23          0.000                    26
## 24          0.000                    31
## 25          0.000                    30
## 26          0.000                     9
## 27          0.000                    23
## 28          0.000                     9
## 29          0.000                    11
## 30          0.000                    22
## 31          0.000                     9
## 32          0.000                    30

Prop table TRB

excel_count_table_trb3 <- excel_count_table_trb2 %>% 
  mutate(nkt_trav11_traj18 = "no")

count_table_trb4 <- as.matrix(excel_count_table_trb3[,5:32])
rownames(count_table_trb4) <- excel_count_table_trb3$aaSeqCDR3


trb4_norm <- scale(count_table_trb4, center=FALSE, scale=colSums(count_table_trb4))

prop.table.trb <- cbind(trb4_norm, excel_count_table_trb %>% select(-starts_with("CD")) ) 
colnames(prop.table.trb)
##  [1] "CD4 Lymph nodes CA Exp01"   "CD4 Lymph nodes CA Exp02"  
##  [3] "CD4 Lymph nodes CA Exp03"   "CD4 Lymph nodes CAKR Exp02"
##  [5] "CD4 Lymph nodes CAKR Exp03" "CD4 Lymph nodes WT Exp02"  
##  [7] "CD4 Lymph nodes WT Exp03"   "CD4 Thymus CA Exp01"       
##  [9] "CD4 Thymus CA Exp03"        "CD4 Thymus CAKR Exp01"     
## [11] "CD4 Thymus CAKR Exp02"      "CD4 Thymus CAKR Exp03"     
## [13] "CD4 Thymus WT Exp02"        "CD4 Thymus WT Exp03"       
## [15] "CD8 Lymph nodes CA Exp02"   "CD8 Lymph nodes CA Exp03"  
## [17] "CD8 Lymph nodes CAKR Exp02" "CD8 Lymph nodes CAKR Exp03"
## [19] "CD8 Lymph nodes WT Exp01"   "CD8 Lymph nodes WT Exp02"  
## [21] "CD8 Lymph nodes WT Exp03"   "CD8 Thymus CA Exp01"       
## [23] "CD8 Thymus CA Exp03"        "CD8 Thymus CAKR Exp02"     
## [25] "CD8 Thymus CAKR Exp03"      "CD8 Thymus WT Exp01"       
## [27] "CD8 Thymus WT Exp02"        "CD8 Thymus WT Exp03"       
## [29] "aaSeqCDR3"                  "allVHitsWithScore"         
## [31] "allDHitsWithScore"          "allJHitsWithScore"
prop.table.trb2 <- prop.table.trb %>% dplyr::select(29, 31, 30, 32, 1:28)


prop.table.trb2 %>% head
##                     aaSeqCDR3 allDHitsWithScore
## CASSDSAETLYF     CASSDSAETLYF              <NA>
## CASSDAEQFF         CASSDAEQFF              <NA>
## CASSDAGYEQYF     CASSDAGYEQYF              <NA>
## CASSDWGGYAEQFF CASSDWGGYAEQFF      TRBD2*00(45)
## CASSDSGGQDTQYF CASSDSGGQDTQYF      TRBD2*00(35)
## CASRDRNTEVFF     CASRDRNTEVFF      TRBD1*00(30)
##                                    allVHitsWithScore allJHitsWithScore
## CASSDSAETLYF   TRBV13-3*00(755.1),TRBV13-1*00(664.5)   TRBJ2-3*00(245)
## CASSDAEQFF                        TRBV13-1*00(819.6) TRBJ2-1*00(219.2)
## CASSDAGYEQYF                     TRBV13-1*00(1283.8)   TRBJ2-7*00(215)
## CASSDWGGYAEQFF                      TRBV13-1*00(676)   TRBJ2-1*00(235)
## CASSDSGGQDTQYF                    TRBV13-3*00(886.3)   TRBJ2-5*00(235)
## CASRDRNTEVFF                        TRBV13-3*00(979) TRBJ1-1*00(232.7)
##                CD4 Lymph nodes CA Exp01 CD4 Lymph nodes CA Exp02
## CASSDSAETLYF               1.729847e-04             5.154108e-05
## CASSDAEQFF                 0.000000e+00             0.000000e+00
## CASSDAGYEQYF               0.000000e+00             5.154108e-05
## CASSDWGGYAEQFF             7.413631e-05             5.154108e-05
## CASSDSGGQDTQYF             2.471210e-05             0.000000e+00
## CASRDRNTEVFF               1.482726e-04             5.154108e-05
##                CD4 Lymph nodes CA Exp03 CD4 Lymph nodes CAKR Exp02
## CASSDSAETLYF               1.141010e-04               3.152287e-05
## CASSDAEQFF                 9.508415e-05               0.000000e+00
## CASSDAGYEQYF               3.803366e-05               0.000000e+00
## CASSDWGGYAEQFF             1.901683e-05               1.260915e-04
## CASSDSGGQDTQYF             3.803366e-05               3.152287e-05
## CASRDRNTEVFF               5.705049e-05               9.456861e-05
##                CD4 Lymph nodes CAKR Exp03 CD4 Lymph nodes WT Exp02
## CASSDSAETLYF                 2.747190e-04             2.637618e-05
## CASSDAEQFF                   2.289325e-05             5.275235e-05
## CASSDAGYEQYF                 6.867975e-05             2.637618e-04
## CASSDWGGYAEQFF               0.000000e+00             0.000000e+00
## CASSDSGGQDTQYF               0.000000e+00             0.000000e+00
## CASRDRNTEVFF                 0.000000e+00             0.000000e+00
##                CD4 Lymph nodes WT Exp03 CD4 Thymus CA Exp01 CD4 Thymus CA Exp03
## CASSDSAETLYF               2.293368e-04        2.366397e-04        6.682259e-05
## CASSDAEQFF                 2.293368e-05        7.887991e-05        4.454839e-05
## CASSDAGYEQYF               2.293368e-05        3.549596e-04        1.113710e-04
## CASSDWGGYAEQFF             0.000000e+00        1.577598e-04        2.227420e-05
## CASSDSGGQDTQYF             0.000000e+00        0.000000e+00        0.000000e+00
## CASRDRNTEVFF               6.880103e-05        0.000000e+00        4.454839e-05
##                CD4 Thymus CAKR Exp01 CD4 Thymus CAKR Exp02
## CASSDSAETLYF            8.731718e-05                     0
## CASSDAEQFF              0.000000e+00                     0
## CASSDAGYEQYF            4.365859e-05                     0
## CASSDWGGYAEQFF          0.000000e+00                     0
## CASSDSGGQDTQYF          0.000000e+00                     0
## CASRDRNTEVFF            0.000000e+00                     0
##                CD4 Thymus CAKR Exp03 CD4 Thymus WT Exp02 CD4 Thymus WT Exp03
## CASSDSAETLYF            0.0000000000        4.972527e-05        2.538554e-04
## CASSDAEQFF              0.0000000000        2.486263e-05        3.173193e-05
## CASSDAGYEQYF            0.0000300075        2.486263e-05        0.000000e+00
## CASSDWGGYAEQFF          0.0000000000        2.486263e-05        3.173193e-05
## CASSDSGGQDTQYF          0.0000000000        2.486263e-05        0.000000e+00
## CASRDRNTEVFF            0.0002100525        0.000000e+00        3.173193e-05
##                CD8 Lymph nodes CA Exp02 CD8 Lymph nodes CA Exp03
## CASSDSAETLYF               3.846495e-04             3.699984e-04
## CASSDAEQFF                 2.662958e-04             1.849992e-04
## CASSDAGYEQYF               3.254727e-04             2.642846e-04
## CASSDWGGYAEQFF             1.183537e-04             2.642846e-04
## CASSDSGGQDTQYF             5.917685e-05             5.285692e-05
## CASRDRNTEVFF               2.367074e-04             5.285692e-05
##                CD8 Lymph nodes CAKR Exp02 CD8 Lymph nodes CAKR Exp03
## CASSDSAETLYF                 3.195909e-04               2.544113e-04
## CASSDAEQFF                   1.452686e-04               7.268895e-05
## CASSDAGYEQYF                 3.776984e-04               2.544113e-04
## CASSDWGGYAEQFF               9.878265e-04               4.179614e-04
## CASSDSGGQDTQYF               2.905372e-05               7.268895e-05
## CASRDRNTEVFF                 1.452686e-04               1.272057e-04
##                CD8 Lymph nodes WT Exp01 CD8 Lymph nodes WT Exp02
## CASSDSAETLYF               0.0005254621             3.488707e-04
## CASSDAEQFF                 0.0002741541             1.365146e-04
## CASSDAGYEQYF               0.0005483082             6.219000e-04
## CASSDWGGYAEQFF             0.0003198465             2.730293e-04
## CASSDSGGQDTQYF             0.0003198465             4.550488e-05
## CASRDRNTEVFF               0.0002056156             4.550488e-05
##                CD8 Lymph nodes WT Exp03 CD8 Thymus CA Exp01 CD8 Thymus CA Exp03
## CASSDSAETLYF               0.0002286917        0.0003316475        1.856493e-04
## CASSDAEQFF                 0.0001829533        0.0002487356        9.282465e-05
## CASSDAGYEQYF               0.0005031216        0.0003316475        1.856493e-04
## CASSDWGGYAEQFF             0.0002058225        0.0001658237        3.094155e-05
## CASSDSGGQDTQYF             0.0000000000        0.0000000000        3.094155e-05
## CASRDRNTEVFF               0.0001372150        0.0000000000        1.237662e-04
##                CD8 Thymus CAKR Exp02 CD8 Thymus CAKR Exp03 CD8 Thymus WT Exp01
## CASSDSAETLYF            0.0000000000          0.0000000000        1.804457e-04
## CASSDAEQFF              0.0002597628          0.0002214839        3.308171e-04
## CASSDAGYEQYF            0.0000000000          0.0001661130        4.210400e-04
## CASSDWGGYAEQFF          0.0003463503          0.0002214839        2.706686e-04
## CASSDSGGQDTQYF          0.0000000000          0.0001107420        6.014857e-05
## CASRDRNTEVFF            0.0000000000          0.0000000000        1.202971e-04
##                CD8 Thymus WT Exp02 CD8 Thymus WT Exp03
## CASSDSAETLYF          4.479484e-04        2.347268e-04
## CASSDAEQFF            2.687690e-04        8.535518e-05
## CASSDAGYEQYF          3.583587e-04        2.774043e-04
## CASSDWGGYAEQFF        2.687690e-04        6.401639e-05
## CASSDSGGQDTQYF        0.000000e+00        2.133880e-05
## CASRDRNTEVFF          8.958968e-05        1.066940e-04
ExpData(data=prop.table.trb2, type=2)
##    Index              Variable_Name Variable_Type Sample_n Missing_Count
## 1      1                  aaSeqCDR3     character   426800             0
## 2      2          allDHitsWithScore     character   331672         95128
## 3      3          allVHitsWithScore     character   426800             0
## 4      4          allJHitsWithScore     character   426800             0
## 5      5   CD4 Lymph nodes CA Exp01       numeric   426800             0
## 6      6   CD4 Lymph nodes CA Exp02       numeric   426800             0
## 7      7   CD4 Lymph nodes CA Exp03       numeric   426800             0
## 8      8 CD4 Lymph nodes CAKR Exp02       numeric   426800             0
## 9      9 CD4 Lymph nodes CAKR Exp03       numeric   426800             0
## 10    10   CD4 Lymph nodes WT Exp02       numeric   426800             0
## 11    11   CD4 Lymph nodes WT Exp03       numeric   426800             0
## 12    12        CD4 Thymus CA Exp01       numeric   426800             0
## 13    13        CD4 Thymus CA Exp03       numeric   426800             0
## 14    14      CD4 Thymus CAKR Exp01       numeric   426800             0
## 15    15      CD4 Thymus CAKR Exp02       numeric   426800             0
## 16    16      CD4 Thymus CAKR Exp03       numeric   426800             0
## 17    17        CD4 Thymus WT Exp02       numeric   426800             0
## 18    18        CD4 Thymus WT Exp03       numeric   426800             0
## 19    19   CD8 Lymph nodes CA Exp02       numeric   426800             0
## 20    20   CD8 Lymph nodes CA Exp03       numeric   426800             0
## 21    21 CD8 Lymph nodes CAKR Exp02       numeric   426800             0
## 22    22 CD8 Lymph nodes CAKR Exp03       numeric   426800             0
## 23    23   CD8 Lymph nodes WT Exp01       numeric   426800             0
## 24    24   CD8 Lymph nodes WT Exp02       numeric   426800             0
## 25    25   CD8 Lymph nodes WT Exp03       numeric   426800             0
## 26    26        CD8 Thymus CA Exp01       numeric   426800             0
## 27    27        CD8 Thymus CA Exp03       numeric   426800             0
## 28    28      CD8 Thymus CAKR Exp02       numeric   426800             0
## 29    29      CD8 Thymus CAKR Exp03       numeric   426800             0
## 30    30        CD8 Thymus WT Exp01       numeric   426800             0
## 31    31        CD8 Thymus WT Exp02       numeric   426800             0
## 32    32        CD8 Thymus WT Exp03       numeric   426800             0
##    Per_of_Missing No_of_distinct_values
## 1           0.000                426800
## 2           0.223                   214
## 3           0.000                 23763
## 4           0.000                  2168
## 5           0.000                    29
## 6           0.000                    21
## 7           0.000                    33
## 8           0.000                    20
## 9           0.000                    38
## 10          0.000                    30
## 11          0.000                    25
## 12          0.000                    24
## 13          0.000                    42
## 14          0.000                    19
## 15          0.000                    14
## 16          0.000                    39
## 17          0.000                    17
## 18          0.000                    15
## 19          0.000                    24
## 20          0.000                    25
## 21          0.000                    24
## 22          0.000                    32
## 23          0.000                    22
## 24          0.000                    32
## 25          0.000                    23
## 26          0.000                     9
## 27          0.000                    20
## 28          0.000                    14
## 29          0.000                    14
## 30          0.000                    16
## 31          0.000                     9
## 32          0.000                    20

less than 5 removed

CD4

Remove CDR3s with occurrence in less than 5 samples

# TRA
keep_tra_cd4 <- excel_count_table_tra %>% 
  mutate(nkt_trav11_traj18 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRAV11") | 
    (grepl(allJHitsWithScore, pattern = "TRAJ18"))),"yes","no")) %>% 
  filter(nkt_trav11_traj18 == "no") %>% 
  select(aaSeqCDR3, starts_with("CD4")) %>% 
  mutate_at(vars(starts_with("CD4")), .funs = binary) 
 
keep_tra_cd4$sum <- rowSums((keep_tra_cd4 %>% select(-aaSeqCDR3)))

keep_tra_cd4$keep <- ifelse(keep_tra_cd4$sum>4,1,0)

keep_tra_cd4_sequences <- pull(keep_tra_cd4 %>% filter(keep == 1), aaSeqCDR3)

# TRB
keep_trb_cd4 <- excel_count_table_trb %>% select(aaSeqCDR3, starts_with("CD4")) %>%  mutate_at(vars(starts_with("CD4")), .funs = binary) 
 
keep_trb_cd4$sum <- rowSums((keep_trb_cd4 %>% select(-aaSeqCDR3)))

keep_trb_cd4$keep <- ifelse(keep_trb_cd4$sum>4,1,0)

keep_trb_cd4_sequences <- pull(keep_trb_cd4 %>% filter(keep == 1), aaSeqCDR3)

CD8

Remove CDR3s with occurrence in less than 5 samples

# TRA
keep_tra_cd8 <- excel_count_table_tra %>% 
  mutate(nkt_trav11_traj18 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRAV11") | 
    (grepl(allJHitsWithScore, pattern = "TRAJ18"))),"yes","no")) %>% 
  filter(nkt_trav11_traj18 == "no") %>% 
  select(aaSeqCDR3, starts_with("CD8")) %>% 
  mutate_at(vars(starts_with("CD8")), .funs = binary) 

keep_tra_cd8$sum <- rowSums((keep_tra_cd8 %>% select(-aaSeqCDR3)))

keep_tra_cd8$keep <- ifelse(keep_tra_cd8$sum>4,1,0)

keep_tra_cd8_sequences <- pull(keep_tra_cd8 %>% filter(keep == 1), aaSeqCDR3)

# TRB
keep_trb_cd8 <- excel_count_table_trb %>% select(aaSeqCDR3, starts_with("CD8")) %>%  mutate_at(vars(starts_with("CD8")), .funs = binary) 
 
keep_trb_cd8$sum <- rowSums((keep_trb_cd8 %>% select(-aaSeqCDR3)))

keep_trb_cd8$keep <- ifelse(keep_trb_cd8$sum>4,1,0)

keep_trb_cd8_sequences <- pull(keep_trb_cd8 %>% filter(keep == 1), aaSeqCDR3)

TRB

excel_count_table_trb3_filter <- excel_count_table_trb2  %>% filter(aaSeqCDR3 %in% keep_trb_cd4_sequences | aaSeqCDR3 %in% keep_trb_cd8_sequences) 

count_table_trb4_filter <- as.matrix(excel_count_table_trb3_filter[,5:32])
rownames(count_table_trb4_filter) <- excel_count_table_trb3_filter$aaSeqCDR3

trb4_norm_filter <- scale(count_table_trb4_filter, center=FALSE, scale=colSums(count_table_trb4_filter))

prop.table.trb_filter <- cbind(trb4_norm_filter, excel_count_table_trb3_filter %>% select(-starts_with("CD")) )

TRA

excel_count_table_tra3_filter <- excel_count_table_tra3  %>% filter(aaSeqCDR3 %in% keep_tra_cd4_sequences | aaSeqCDR3 %in% keep_tra_cd8_sequences) %>% filter(nkt_trav11_traj18 == "no")

count_table_tra4_filter <- as.matrix(excel_count_table_tra3_filter[,5:32])
rownames(count_table_tra4_filter) <- excel_count_table_tra3_filter$aaSeqCDR3

tra4_norm_filter <- scale(count_table_tra4_filter, center=FALSE, scale=colSums(count_table_tra4_filter))

prop.table.tra_filter <- cbind(tra4_norm_filter, excel_count_table_tra3_filter %>% select(-starts_with("CD")) )

colnames(prop.table.tra_filter)
##  [1] "CD4 Lymph nodes CA Exp01"   "CD4 Lymph nodes CA Exp02"  
##  [3] "CD4 Lymph nodes CA Exp03"   "CD4 Lymph nodes CAKR Exp02"
##  [5] "CD4 Lymph nodes CAKR Exp03" "CD4 Lymph nodes WT Exp02"  
##  [7] "CD4 Lymph nodes WT Exp03"   "CD4 Thymus CA Exp01"       
##  [9] "CD4 Thymus CA Exp03"        "CD4 Thymus CAKR Exp01"     
## [11] "CD4 Thymus CAKR Exp02"      "CD4 Thymus CAKR Exp03"     
## [13] "CD4 Thymus WT Exp02"        "CD4 Thymus WT Exp03"       
## [15] "CD8 Lymph nodes CA Exp02"   "CD8 Lymph nodes CA Exp03"  
## [17] "CD8 Lymph nodes CAKR Exp02" "CD8 Lymph nodes CAKR Exp03"
## [19] "CD8 Lymph nodes WT Exp01"   "CD8 Lymph nodes WT Exp02"  
## [21] "CD8 Lymph nodes WT Exp03"   "CD8 Thymus CA Exp01"       
## [23] "CD8 Thymus CA Exp03"        "CD8 Thymus CAKR Exp02"     
## [25] "CD8 Thymus CAKR Exp03"      "CD8 Thymus WT Exp01"       
## [27] "CD8 Thymus WT Exp02"        "CD8 Thymus WT Exp03"       
## [29] "aaSeqCDR3"                  "allDHitsWithScore"         
## [31] "allJHitsWithScore"          "allVHitsWithScore"         
## [33] "nkt_trav11_traj18"

Gene segment usage analysis

immdata_tra <- readRDS("immdata_tra.rds")
immdata_trb <- readRDS("immdata_trb.rds")

TRA

### Thymus

cd4_thymus <- geneUsage(repFilter(immdata_tra, 
                                 .method = "by.meta", 
                                 .query = list(Organ = include('Thymus'), Cell_type = include('CD4')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trav")

vis(cd4_thymus, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_tra$meta) + ggtitle("TRA Thymus CD4") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70"))

#ggsave("./final_fig/gene_usage_new/cd4_thymus_tra_just1segment.png", width = 60, height = 12, units = "cm") 
#ggsave("./final_fig/gene_usage_new/cd4_thymus_tra_just1segment.svg", width = 60, height = 12, units = "cm") 


### LN
cd4_ln <- geneUsage(repFilter(immdata_tra, 
                                 .method = "by.meta", 
                                 .query = list(Organ = exclude('Thymus'), Cell_type = include('CD4')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trav")

vis(cd4_ln, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_tra$meta) + ggtitle("TRA LN CD4") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70"))

#ggsave("./final_fig/gene_usage_new/cd4_ln_tra_just1segment.png", width = 60, height = 12, units = "cm")
#ggsave("./final_fig/gene_usage_new/cd4_ln_tra_just1segment.svg", width = 60, height = 12, units = "cm")

cd8_thymus <- geneUsage(repFilter(immdata_tra, 
                                 .method = "by.meta", 
                                 .query = list(Organ = include('Thymus'), Cell_type = include('CD8')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trav")

vis(cd8_thymus, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_tra$meta) + ggtitle("TRA Thymus CD8") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70"))

#ggsave("./final_fig/gene_usage_new/cd8_thymus_tra_just1segment.png", width = 60, height = 12, units = "cm")
#ggsave("./final_fig/gene_usage_new/cd8_thymus_tra_just1segment.svg", width = 60, height = 12, units = "cm")


cd8_ln <- geneUsage(repFilter(immdata_tra, 
                                 .method = "by.meta", 
                                 .query = list(Organ = exclude('Thymus'), Cell_type = include('CD8')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trav")

vis(cd8_ln, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_tra$meta) + ggtitle("TRA LN CD8") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70"))

#ggsave("./final_fig/gene_usage_new/cd8_ln_tra_just1segment.png", width = 60, height = 12, units = "cm")
#ggsave("./final_fig/gene_usage_new/cd8_ln_tra_just1segment.svg", width = 60, height = 12, units = "cm")

TRB

### Thymus

cd4_thymus <- geneUsage(repFilter(immdata_trb, 
                                 .method = "by.meta", 
                                 .query = list(Organ = include('Thymus'), Cell_type = include('CD4')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trbv")

vis(cd4_thymus, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_trb$meta, .test = F) + ggtitle("TRB Thymus CD4") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70")) 

#ggsave("./final_fig/gene_usage_new/cd4_thymus_trb_just1segment.png", width = 20, height = 10, units = "cm") 
#ggsave("./final_fig/gene_usage_new/cd4_thymus_trb_just1segment.svg", width = 20, height = 10, units = "cm")

### LN
cd4_ln <- geneUsage(repFilter(immdata_trb, 
                                 .method = "by.meta", 
                                 .query = list(Organ = exclude('Thymus'), Cell_type = include('CD4')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trbv")

vis(cd4_ln, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_trb$meta, .test = F) + ggtitle("TRB LN CD4") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70"))

#ggsave("./final_fig/gene_usage_new/cd4_ln_trb_just1segment.png", width = 20, height = 10, units = "cm")
#ggsave("./final_fig/gene_usage_new/cd4_ln_trb_just1segment.svg", width = 20, height = 10, units = "cm")


cd8_thymus <- geneUsage(repFilter(immdata_trb, 
                                 .method = "by.meta", 
                                 .query = list(Organ = include('Thymus'), Cell_type = include('CD8')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trbv")

vis(cd8_thymus, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_trb$meta, .test = F) + ggtitle("TRB Thymus CD8") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70"))

#ggsave("./final_fig/gene_usage_new/cd8_thymus_trb_just1segment.png", width = 20, height = 10, units = "cm")
#ggsave("./final_fig/gene_usage_new/cd8_thymus_trb_just1segment.svg", width = 20, height = 10, units = "cm")


cd8_ln <- geneUsage(repFilter(immdata_trb, 
                                 .method = "by.meta", 
                                 .query = list(Organ = exclude('Thymus'), Cell_type = include('CD8')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trbv")

vis(cd8_ln, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_trb$meta, .test = F) + ggtitle("TRB LN CD8") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70"))

#ggsave("./final_fig/gene_usage_new/cd8_ln_trb_just1segment.png", width = 20, height = 10, units = "cm")
#ggsave("./final_fig/gene_usage_new/cd8_ln_trb_just1segment.svg", width = 20, height = 10, units = "cm")

Analysis of NKT cells

Next, we will identify gene segments that are typical for invariant NKT cells. These segments are TRAV11 and TRAJ18 for TRA and TRBV1, TRBV13-2 and TRBV29 for TRB. Please, note that some of the included TRB chains are used by conventional cells as well.

Counts of NKT gene segments:

# NKT analysis
merged_repertoires <- merged_repertoires %>% 
  mutate(is_nkt = if_else(
    (grepl(allVHitsWithScore, pattern = "TRAV11")) & (grepl(allJHitsWithScore, pattern = "TRAJ18")) |
    (grepl(allVHitsWithScore, pattern = "TRBV13-2")) |
      (grepl(allVHitsWithScore, pattern = "TRBV1\\*")) |
      (grepl(allVHitsWithScore, pattern = "TRBV29"))  ,"yes","no")) %>% 
  mutate(nkt_trav11_traj18 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRAV11") & (grepl(allJHitsWithScore, pattern = "TRAJ18"))),"yes","no")) %>% mutate(nkt_trbv13_2 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRBV13-2")),"yes","no")) %>% mutate(nkt_trbv29 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRBV29")),"yes","no")) %>% mutate(nkt_trbv1 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRBV1\\*")),"yes","no"))

md <- md %>% select(Exp, Organ, Cell_type, Mouse_strain, num_id) %>% mutate_at(vars("num_id"), as.numeric)

# All TRB NKT
nkt_table_trb <- merged_repertoires %>% 
   filter(chain == "TRB") %>% 
 mutate(new_name = paste(Exp, Organ, Cell_type, Mouse_strain)) %>% 
  dplyr::select(new_name, cloneCount, is_nkt, num_id) %>% 
  uncount(cloneCount) %>% 
  group_by(num_id, is_nkt) %>% 
    summarise(n = n()) %>% 
mutate(freq = n / sum(n)) %>% dplyr::filter(is_nkt == "yes") %>% 
  ungroup %>% 
  left_join(md) %>% 
  unique %>%
   mutate(sample_type = paste(Cell_type, Organ, Mouse_strain)) 

# TRAV11 TRAJ18
nkt_table_trav11_traj18 <- merged_repertoires %>% 
   filter(chain == "TRA") %>% 
 mutate(new_name = paste(Exp, Organ, Cell_type, Mouse_strain)) %>% 
  dplyr::select(new_name, cloneCount, nkt_trav11_traj18, num_id) %>% 
  uncount(cloneCount) %>% 
  group_by(num_id, nkt_trav11_traj18) %>% 
    summarise(n = n()) %>% 
mutate(freq = n / sum(n)) %>% dplyr::filter(nkt_trav11_traj18 == "yes") %>% 
  ungroup %>% 
  left_join(md) %>% 
  unique %>%
   mutate(sample_type = paste(Cell_type, Organ, Mouse_strain)) 

# TRBV1
nkt_table_trbv1 <- merged_repertoires %>% 
   filter(chain == "TRB") %>% 
 mutate(new_name = paste(Exp, Organ, Cell_type, Mouse_strain)) %>% 
  dplyr::select(new_name, cloneCount, nkt_trbv1, num_id) %>% 
  uncount(cloneCount) %>% 
  group_by(num_id, nkt_trbv1) %>% 
    summarise(n = n()) %>% 
mutate(freq = n / sum(n)) %>% dplyr::filter(nkt_trbv1 == "yes") %>% 
  ungroup %>% 
  left_join(md) %>% 
  unique %>%
   mutate(sample_type = paste(Cell_type, Organ, Mouse_strain)) 

# TRBV13-2
nkt_table_trbv13_2 <- merged_repertoires %>% 
   filter(chain == "TRB") %>% 
 mutate(new_name = paste(Exp, Organ, Cell_type, Mouse_strain)) %>% 
  dplyr::select(new_name, cloneCount, nkt_trbv13_2, num_id) %>% 
  uncount(cloneCount) %>% 
  group_by(num_id, nkt_trbv13_2) %>% 
    summarise(n = n()) %>% 
mutate(freq = n / sum(n)) %>% dplyr::filter(nkt_trbv13_2 == "yes") %>% 
  ungroup %>% 
  left_join(md) %>% 
  unique %>%
   mutate(sample_type = paste(Cell_type, Organ, Mouse_strain)) 


# TRBV29
nkt_table_trbv29 <- merged_repertoires %>% 
   filter(chain == "TRB") %>% 
 mutate(new_name = paste(Exp, Organ, Cell_type, Mouse_strain)) %>% 
  dplyr::select(new_name, cloneCount, nkt_trbv29, num_id) %>% 
  uncount(cloneCount) %>% 
  group_by(num_id, nkt_trbv29) %>% 
    summarise(n = n()) %>% 
mutate(freq = n / sum(n)) %>% dplyr::filter(nkt_trbv29 == "yes") %>% 
  ungroup %>% 
  left_join(md) %>% 
  unique %>%
   mutate(sample_type = paste(Cell_type, Organ, Mouse_strain)) 

levels_cd4 <- c("CD4 Thymus WT", "CD4 Thymus CA", "CD4 Thymus CAKR", "CD4 Lymph nodes WT", "CD4 Lymph nodes CA", "CD4 Lymph nodes CAKR")
levels_cd8 <- c("CD8 Thymus WT", "CD8 Thymus CA", "CD8 Thymus CAKR", "CD8 Lymph nodes WT", "CD8 Lymph nodes CA", "CD8 Lymph nodes CAKR")
nkt_table_trb$nkt_sample <- "all_nkt_trb"
nkt_table_trav11_traj18$nkt_sample <- "trav11_traj18"
nkt_table_trbv1$nkt_sample <- "trbv1"
nkt_table_trbv13_2$nkt_sample <- "trbv13_2"
nkt_table_trbv29$nkt_sample <- "trbv29"

nkt_table <- cbind(nkt_table_trb$num_id, nkt_table_trb$sample_type, nkt_table_trb$Exp, format(nkt_table_trb$freq, digits = 2), 
                   format(nkt_table_trav11_traj18$freq, digits = 2),
format(nkt_table_trbv1$freq, digits = 2), 
format(nkt_table_trbv13_2$freq, digits = 2),
                   format(nkt_table_trbv29$freq, digits = 2)) %>% as.data.frame

colnames(nkt_table) <- c("num_id","Sample","Exp","All TRB NKT", "TRAV11-TRAJ18", "TRBV1","TRBV13-2","TRBV29")
nkt_table <- nkt_table %>% mutate(Sample = paste(Sample, Exp)) %>% select(-num_id, -Exp) %>% arrange(Sample)

kable(nkt_table, format = "html") %>%
  kable_styling(full_width = F, font_size = 11, 
                bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Sample All TRB NKT TRAV11-TRAJ18 TRBV1 TRBV13-2 TRBV29
CD4 Lymph nodes CA 1 0.18 0.02574 0.032 0.120 0.027
CD4 Lymph nodes CA 2 0.19 0.02326 0.031 0.124 0.031
CD4 Lymph nodes CA 3 0.16 0.02046 0.024 0.109 0.028
CD4 Lymph nodes CAKR 2 0.19 0.01704 0.031 0.127 0.029
CD4 Lymph nodes CAKR 3 0.18 0.02357 0.034 0.120 0.030
CD4 Lymph nodes WT 2 0.17 0.00474 0.028 0.114 0.029
CD4 Lymph nodes WT 3 0.17 0.00307 0.031 0.111 0.029
CD4 Thymus CA 1 0.36 0.25798 0.032 0.228 0.101
CD4 Thymus CA 3 0.38 0.26069 0.033 0.219 0.124
CD4 Thymus CAKR 1 0.30 0.22058 0.033 0.205 0.065
CD4 Thymus CAKR 2 0.32 0.38268 0.069 0.185 0.068
CD4 Thymus CAKR 3 0.38 0.23262 0.042 0.249 0.084
CD4 Thymus WT 2 0.23 0.08824 0.034 0.136 0.058
CD4 Thymus WT 3 0.19 0.06685 0.038 0.107 0.045
CD8 Lymph nodes CA 2 0.21 0.00076 0.028 0.084 0.099
CD8 Lymph nodes CA 3 0.20 0.00137 0.026 0.085 0.091
CD8 Lymph nodes CAKR 2 0.24 0.00084 0.030 0.122 0.087
CD8 Lymph nodes CAKR 3 0.20 0.00080 0.027 0.089 0.082
CD8 Lymph nodes WT 1 0.23 0.00117 0.028 0.108 0.096
CD8 Lymph nodes WT 2 0.21 0.00051 0.024 0.091 0.098
CD8 Lymph nodes WT 3 0.21 0.00091 0.024 0.089 0.095
CD8 Thymus CA 1 0.22 0.00219 0.029 0.116 0.078
CD8 Thymus CA 3 0.19 0.00102 0.024 0.084 0.078
CD8 Thymus CAKR 2 0.23 0.00521 0.028 0.124 0.076
CD8 Thymus CAKR 3 0.19 0.00188 0.032 0.105 0.056
CD8 Thymus WT 1 0.23 0.00126 0.022 0.111 0.101
CD8 Thymus WT 2 0.26 0.00115 0.027 0.129 0.102
CD8 Thymus WT 3 0.20 0.00126 0.023 0.086 0.090

All NKT

## CD4
p01 <- nkt_table_trb %>% 
  filter(Cell_type == "CD4") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd4))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("NKT segments in CD4 TRB") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_all_nkt_trb_cd4.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_all_nkt_trb_cd4.eps", width = 8, height = 8, units = "cm")

## CD8
p02 <- nkt_table_trb %>% 
  filter(Cell_type == "CD8") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd8))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("NKT segments in CD8 TRB") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_all_nkt_trb_cd8.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_all_nkt_trb_cd8.eps", width = 8, height = 8, units = "cm")

p01 + p02

TRAV11 TRAJ18

## CD4
p03 <- nkt_table_trav11_traj18 %>% 
  filter(Cell_type == "CD4") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd4))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("TRAV11-TRAJ18 segments in CD4 cells") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trav11_traj18_tra_cd4.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trav11_traj18_tra_cd4.eps", width = 8, height = 8, units = "cm")

## CD8
p04 <- nkt_table_trav11_traj18 %>% 
  filter(Cell_type == "CD8") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd8))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("TRAV11-TRAJ18 segments in CD8 cells") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,1)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trav11_traj18_tra_cd8.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trav11_traj18_tra_cd8.eps", width = 8, height = 8, units = "cm")

p03 + p04

TRBV1

## CD4
p05 <- nkt_table_trbv1 %>% 
  filter(Cell_type == "CD4") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd4))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("CD4 TRBV1") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trbv1_cd4.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trbv1_cd4.eps", width = 8, height = 8, units = "cm")

## CD8
p05 <- nkt_table_trbv1 %>% 
  filter(Cell_type == "CD8") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd8))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("CD8 TRBV1") + 
  theme_classic() + 
  ggtheme() + 
  
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trbv1_cd8.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trbv1_cd8.eps", width = 8, height = 8, units = "cm")

p05 + p06
## Error in eval(expr, envir, enclos): object 'p06' not found

TRBV13-2

## CD4
p07 <- nkt_table_trbv13_2 %>% 
  filter(Cell_type == "CD4") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd4))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("TRBV13-2 segments in CD4 cells") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trbv13_2_cd4.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trbv13_2_cd4.eps", width = 8, height = 8, units = "cm")

## CD8
p08 <- nkt_table_trbv13_2 %>% 
  filter(Cell_type == "CD8") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd8))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("TRBV13-2 segments in CD8 cells") + 
  theme_classic() + 
  ggtheme() + 
  
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trbv13_2_cd8.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trbv13_2_cd8.eps", width = 8, height = 8, units = "cm")

p07 + p08

TRBV29

## CD4
p09 <- nkt_table_trbv29 %>% 
  filter(Cell_type == "CD4") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd4))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("CD4 TRBV29") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trbv29_cd4.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trbv29_cd4.eps", width = 8, height = 8, units = "cm")

## CD8
p10 <- nkt_table_trbv29 %>% 
  filter(Cell_type == "CD8") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd8))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("CD8 TRBV29") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trbv29_cd8.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trbv29_cd8.eps", width = 8, height = 8, units = "cm")

p09 + p10

PCA

prop.table.tra2_filter.cd4 <- prop.table.tra_filter[,1:14]
prop.table.tra2_filter.cd4$sum <- rowSums(prop.table.tra2_filter.cd4)
prop.table.tra2_filter.cd4 <- prop.table.tra2_filter.cd4 %>% filter(sum >0) %>% select(-sum)

prop.table.trb2_filter.cd4 <- prop.table.trb_filter[,1:14]
prop.table.trb2_filter.cd4$sum <- rowSums(prop.table.trb2_filter.cd4)
prop.table.trb2_filter.cd4 <- prop.table.trb2_filter.cd4 %>% filter(sum >0) %>% select(-sum)

prop.table.tra2_filter.cd8 <- prop.table.tra_filter[,15:28]
prop.table.tra2_filter.cd8$sum <- rowSums(prop.table.tra2_filter.cd8)
prop.table.tra2_filter.cd8 <- prop.table.tra2_filter.cd8 %>% filter(sum >0) %>% select(-sum)

prop.table.trb2_filter.cd8 <- prop.table.trb_filter[,15:28]
prop.table.trb2_filter.cd8$sum <- rowSums(prop.table.trb2_filter.cd8)
prop.table.trb2_filter.cd8 <- prop.table.trb2_filter.cd8 %>% filter(sum >0) %>% select(-sum)

TRA + TRB together

CD4

prop.table.tra2_filter.merge <- rbind(prop.table.tra2_filter.cd4, prop.table.trb2_filter.cd4)
  
res.pca.merge.cd4 <- prcomp(t(prop.table.tra2_filter.merge), scale = TRUE, center = T)

mdres.pca.merge.cd4  <-  colnames(prop.table.tra2_filter.merge[,1:14])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))

fviz_pca_ind(res.pca.merge.cd4,
             col.ind = as.factor(mdres.pca.merge.cd4$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.merge.cd4$Organ),
                  color = as.factor(mdres.pca.merge.cd4$Strain))) + 
  scale_shape_manual(values = c(8,8,8,15,15)) +
  ggtheme() +
  theme(axis.text.x = element_text(angle = 0))

#ggsave("final_fig/pca/cd4.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd4.svg", width = 2.6, height = 1.7)

Zoom

fviz_pca_ind(res.pca.merge.cd4,
             col.ind = as.factor(mdres.pca.merge.cd4$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.merge.cd4$Organ),
                  color = as.factor(mdres.pca.merge.cd4$Strain))) + 
  scale_shape_manual(values = c(8,8,8,15,15)) +
  xlim(c(-30,0)) + ylim(-2,15) +
  ggtheme() +
  theme(axis.text.x = element_text(angle = 0))

#ggsave("final_fig/pca/cd4_zoom.png", width = 2.2, height = 1.4)
#ggsave("final_fig/pca/cd4_zoom.svg", width = 2.2, height = 1.4)

CD8

prop.table.tra2_filter.merge <- rbind(prop.table.tra2_filter.cd8, prop.table.trb2_filter.cd8)
  
res.pca.merge.cd8 <- prcomp(t(prop.table.tra2_filter.merge), scale = TRUE, center = T)

mdres.pca.merge.cd8  <-  colnames(prop.table.tra2_filter.merge[,1:14])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
  

fviz_pca_ind(res.pca.merge.cd8,
             col.ind = as.factor(mdres.pca.merge.cd8$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.merge.cd8$Organ),
                  color = as.factor(mdres.pca.merge.cd8$Strain))) + 
  scale_shape_manual(values = c(8,8,8,15,15)) +
  ggtheme()

#ggsave("final_fig/pca/cd8.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd8.svg", width = 2.6, height = 1.7)

Zoom

fviz_pca_ind(res.pca.merge.cd8,
             col.ind = as.factor(mdres.pca.merge.cd8$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.merge.cd8$Organ),
                  color = as.factor(mdres.pca.merge.cd8$Strain))) + 
  scale_shape_manual(values = c(8,8,8,15,15)) +
  xlim(c(-32,-18)) + ylim(-4,5) +
  ggtheme()

#ggsave("final_fig/pca/cd8_zoom.png", width = 2.2, height = 1.4)
#ggsave("final_fig/pca/cd8_zoom.svg", width = 2.2, height = 1.4)

TRA

res.pca <- prcomp(t(prop.table.tra_filter[,1:28]), scale = TRUE)


md  <-  colnames(prop.table.tra_filter[,1:28])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))

CD4

## All

prop.table.tra2_filter.cd4 <- prop.table.tra_filter[,1:14]
prop.table.tra2_filter.cd4$sum <- rowSums(prop.table.tra2_filter.cd4)

prop.table.tra2_filter.cd4 <- prop.table.tra2_filter.cd4 %>% filter(sum >0) %>% select(-sum)

res.pca.tra.cd4 <- prcomp(t(prop.table.tra2_filter.cd4), scale = TRUE, center = T)

mdres.pca.tra.cd4  <-  colnames(prop.table.tra_filter[,1:14])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
  
# LN
prop.table.tra2_filter %>% colnames
## Error in is.data.frame(x): object 'prop.table.tra2_filter' not found
prop.table.tra2_filter.cd4.ln <- prop.table.tra_filter[,1:7]
prop.table.tra2_filter.cd4.ln$sum <- rowSums(prop.table.tra2_filter.cd4.ln)

prop.table.tra2_filter.cd4.ln <- prop.table.tra2_filter.cd4.ln %>% filter(sum >0) %>% select(-sum)

res.pca.tra.cd4.ln <- prcomp(t(prop.table.tra2_filter.cd4.ln), scale = TRUE, center = T)

mdres.pca.tra.cd4.ln  <-  colnames(prop.table.tra_filter[,1:7])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
  

fviz_pca_ind(res.pca.tra.cd4.ln,
             col.ind = as.factor(mdres.pca.tra.cd4.ln$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.tra.cd4.ln$Organ),
                  color = as.factor(mdres.pca.tra.cd4.ln$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") 

#ggsave("final_fig/pca/cd4_tra_ln.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd4_tra_ln.svg", width = 2.6, height = 1.7)

# Thymus

prop.table.tra2_filter.cd4.thy <- prop.table.tra_filter[,8:14]
prop.table.tra2_filter.cd4.thy$sum <- rowSums(prop.table.tra2_filter.cd4.thy)

prop.table.tra2_filter.cd4.thy <- prop.table.tra2_filter.cd4.thy %>% filter(sum >0) %>% select(-sum)

res.pca.tra.cd4.thy <- prcomp(t(prop.table.tra2_filter.cd4.thy), scale = TRUE, center = T)

mdres.pca.tra.cd4.thy  <-  colnames(prop.table.tra_filter[,8:14])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
  
fviz_pca_ind(res.pca.tra.cd4.thy,
             col.ind = as.factor(mdres.pca.tra.cd4.thy$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.tra.cd4.thy$Organ),
                  color = as.factor(mdres.pca.tra.cd4.thy$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") +
  ggtheme()

#ggsave("final_fig/pca/cd4_tra_thy.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd4_tra_thy.svg", width = 2.6, height = 1.7)

CD8

## All

prop.table.tra2_filter.cd8 <- prop.table.tra_filter[,15:28]
prop.table.tra2_filter.cd8$sum <- rowSums(prop.table.tra2_filter.cd8)

prop.table.tra2_filter.cd8 <- prop.table.tra2_filter.cd8 %>% filter(sum >0) %>% select(-sum)

res.pca.tra.cd8 <- prcomp(t(prop.table.tra2_filter.cd8), scale = TRUE, center = T)

mdres.pca.tra.cd8  <-  colnames(prop.table.tra2_filter[,15:28])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
## Error in is.data.frame(x): object 'prop.table.tra2_filter' not found
# LN
prop.table.tra2_filter %>% colnames
## Error in is.data.frame(x): object 'prop.table.tra2_filter' not found
prop.table.tra2_filter.cd8.ln <- prop.table.tra_filter[,15:21]
prop.table.tra2_filter.cd8.ln$sum <- rowSums(prop.table.tra2_filter.cd8.ln)

prop.table.tra2_filter.cd8.ln <- prop.table.tra2_filter.cd8.ln %>% filter(sum >0) %>% select(-sum)

res.pca.tra.cd8.ln <- prcomp(t(prop.table.tra2_filter.cd8.ln), scale = TRUE, center = T)

mdres.pca.tra.cd8.ln  <-  colnames(prop.table.tra_filter[,15:21])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
  
fviz_pca_ind(res.pca.tra.cd8.ln,
             col.ind = as.factor(mdres.pca.tra.cd8.ln$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.tra.cd8.ln$Organ),
                  color = as.factor(mdres.pca.tra.cd8.ln$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") 

#ggsave("final_fig/pca/cd8_tra_ln.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd8_tra_ln.svg", width = 2.6, height = 1.7)

# Thymus

prop.table.tra2_filter %>% colnames
## Error in is.data.frame(x): object 'prop.table.tra2_filter' not found
prop.table.tra2_filter.cd8.thy <- prop.table.tra_filter[,22:28]
prop.table.tra2_filter.cd8.thy$sum <- rowSums(prop.table.tra2_filter.cd8.thy)

prop.table.tra2_filter.cd8.thy <- prop.table.tra2_filter.cd8.thy %>% filter(sum >0) %>% select(-sum)

res.pca.tra.cd8.thy <- prcomp(t(prop.table.tra2_filter.cd8.thy), scale = TRUE, center = T)

mdres.pca.tra.cd8.thy  <-  colnames(prop.table.tra_filter[,22:28])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
 
fviz_pca_ind(res.pca.tra.cd8.thy,
             col.ind = as.factor(mdres.pca.tra.cd8.thy$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.tra.cd8.thy$Organ),
                  color = as.factor(mdres.pca.tra.cd8.thy$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") 

#ggsave("final_fig/pca/cd8_tra_thy.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd8_tra_thy.svg", width = 2.6, height = 1.7)

TRB

res.pca.trb <- prcomp(t(prop.table.trb_filter[,5:32]), scale = TRUE)
## Error in colMeans(x, na.rm = TRUE): 'x' must be numeric
md_trb  <-  colnames(prop.table.trb_filter[,5:32])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))

CD4

## All
colnames(prop.table.trb2_filter)
## Error in is.data.frame(x): object 'prop.table.trb2_filter' not found
prop.table.trb2_filter.cd4 <- prop.table.trb_filter[,5:18]
prop.table.trb2_filter.cd4$sum <- rowSums(prop.table.trb2_filter.cd4)

prop.table.trb2_filter.cd4 <- prop.table.trb2_filter.cd4 %>% filter(sum >0) %>% select(-sum)

res.pca.trb.cd4 <- prcomp(t(prop.table.trb2_filter.cd4), scale = TRUE, center = T)

mdres.pca.trb.cd4  <-  colnames(prop.table.trb_filter[,5:18])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
 

# LN
prop.table.trb_filter %>% colnames
##  [1] "CD4 Lymph nodes CA Exp01"   "CD4 Lymph nodes CA Exp02"  
##  [3] "CD4 Lymph nodes CA Exp03"   "CD4 Lymph nodes CAKR Exp02"
##  [5] "CD4 Lymph nodes CAKR Exp03" "CD4 Lymph nodes WT Exp02"  
##  [7] "CD4 Lymph nodes WT Exp03"   "CD4 Thymus CA Exp01"       
##  [9] "CD4 Thymus CA Exp03"        "CD4 Thymus CAKR Exp01"     
## [11] "CD4 Thymus CAKR Exp02"      "CD4 Thymus CAKR Exp03"     
## [13] "CD4 Thymus WT Exp02"        "CD4 Thymus WT Exp03"       
## [15] "CD8 Lymph nodes CA Exp02"   "CD8 Lymph nodes CA Exp03"  
## [17] "CD8 Lymph nodes CAKR Exp02" "CD8 Lymph nodes CAKR Exp03"
## [19] "CD8 Lymph nodes WT Exp01"   "CD8 Lymph nodes WT Exp02"  
## [21] "CD8 Lymph nodes WT Exp03"   "CD8 Thymus CA Exp01"       
## [23] "CD8 Thymus CA Exp03"        "CD8 Thymus CAKR Exp02"     
## [25] "CD8 Thymus CAKR Exp03"      "CD8 Thymus WT Exp01"       
## [27] "CD8 Thymus WT Exp02"        "CD8 Thymus WT Exp03"       
## [29] "aaSeqCDR3"                  "allVHitsWithScore"         
## [31] "allDHitsWithScore"          "allJHitsWithScore"
prop.table.trb2_filter.cd4.ln <- prop.table.trb_filter[,5:11]
prop.table.trb2_filter.cd4.ln$sum <- rowSums(prop.table.trb2_filter.cd4.ln)

prop.table.trb2_filter.cd4.ln <- prop.table.trb2_filter.cd4.ln %>% filter(sum >0) %>% select(-sum)

res.pca.trb.cd4.ln <- prcomp(t(prop.table.trb2_filter.cd4.ln), scale = TRUE, center = T)

mdres.pca.trb.cd4.ln  <-  colnames(prop.table.trb_filter[,5:11])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
  

fviz_pca_ind(res.pca.trb.cd4.ln,
             col.ind = as.factor(mdres.pca.trb.cd4.ln$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.trb.cd4.ln$Organ),
                  color = as.factor(mdres.pca.trb.cd4.ln$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") 

#ggsave("final_fig/pca/cd4_trb_ln.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd4_trb_ln.svg", width = 2.6, height = 1.7)

# Thymus

prop.table.trb_filter %>% colnames
##  [1] "CD4 Lymph nodes CA Exp01"   "CD4 Lymph nodes CA Exp02"  
##  [3] "CD4 Lymph nodes CA Exp03"   "CD4 Lymph nodes CAKR Exp02"
##  [5] "CD4 Lymph nodes CAKR Exp03" "CD4 Lymph nodes WT Exp02"  
##  [7] "CD4 Lymph nodes WT Exp03"   "CD4 Thymus CA Exp01"       
##  [9] "CD4 Thymus CA Exp03"        "CD4 Thymus CAKR Exp01"     
## [11] "CD4 Thymus CAKR Exp02"      "CD4 Thymus CAKR Exp03"     
## [13] "CD4 Thymus WT Exp02"        "CD4 Thymus WT Exp03"       
## [15] "CD8 Lymph nodes CA Exp02"   "CD8 Lymph nodes CA Exp03"  
## [17] "CD8 Lymph nodes CAKR Exp02" "CD8 Lymph nodes CAKR Exp03"
## [19] "CD8 Lymph nodes WT Exp01"   "CD8 Lymph nodes WT Exp02"  
## [21] "CD8 Lymph nodes WT Exp03"   "CD8 Thymus CA Exp01"       
## [23] "CD8 Thymus CA Exp03"        "CD8 Thymus CAKR Exp02"     
## [25] "CD8 Thymus CAKR Exp03"      "CD8 Thymus WT Exp01"       
## [27] "CD8 Thymus WT Exp02"        "CD8 Thymus WT Exp03"       
## [29] "aaSeqCDR3"                  "allVHitsWithScore"         
## [31] "allDHitsWithScore"          "allJHitsWithScore"
prop.table.trb2_filter.cd4.thy <- prop.table.trb_filter[,12:18]
prop.table.trb2_filter.cd4.thy$sum <- rowSums(prop.table.trb2_filter.cd4.thy)

prop.table.trb2_filter.cd4.thy <- prop.table.trb2_filter.cd4.thy %>% filter(sum >0) %>% select(-sum)

res.pca.trb.cd4.thy <- prcomp(t(prop.table.trb2_filter.cd4.thy), scale = TRUE, center = T)

mdres.pca.trb.cd4.thy  <-  colnames(prop.table.trb_filter[,12:18])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
 
fviz_pca_ind(res.pca.trb.cd4.thy,
             col.ind = as.factor(mdres.pca.trb.cd4.thy$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.trb.cd4.thy$Organ),
                  color = as.factor(mdres.pca.trb.cd4.thy$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") 

#ggsave("final_fig/pca/cd4_trb_thy.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd4_trb_thy.svg", width = 2.6, height = 1.7)

CD8

## All

prop.table.trb2_filter.cd8 <- prop.table.trb_filter[,19:32]
prop.table.trb2_filter.cd8$sum <- rowSums(prop.table.trb2_filter.cd8)
## Error in rowSums(prop.table.trb2_filter.cd8): 'x' must be numeric
prop.table.trb2_filter.cd8 <- prop.table.trb2_filter.cd8 %>% filter(sum >0) %>% select(-sum)
## Error in `filter()`:
## ! Problem while computing `..1 = sum > 0`.
## Caused by error in `sum > 0`:
## ! comparison (6) is possible only for atomic and list types
res.pca.trb.cd8 <- prcomp(t(prop.table.trb2_filter.cd8), scale = TRUE, center = T)
## Error in colMeans(x, na.rm = TRUE): 'x' must be numeric
mdres.pca.trb.cd8  <-  colnames(prop.table.trb_filter[,19:32])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
  

# LN
prop.table.trb_filter %>% colnames
##  [1] "CD4 Lymph nodes CA Exp01"   "CD4 Lymph nodes CA Exp02"  
##  [3] "CD4 Lymph nodes CA Exp03"   "CD4 Lymph nodes CAKR Exp02"
##  [5] "CD4 Lymph nodes CAKR Exp03" "CD4 Lymph nodes WT Exp02"  
##  [7] "CD4 Lymph nodes WT Exp03"   "CD4 Thymus CA Exp01"       
##  [9] "CD4 Thymus CA Exp03"        "CD4 Thymus CAKR Exp01"     
## [11] "CD4 Thymus CAKR Exp02"      "CD4 Thymus CAKR Exp03"     
## [13] "CD4 Thymus WT Exp02"        "CD4 Thymus WT Exp03"       
## [15] "CD8 Lymph nodes CA Exp02"   "CD8 Lymph nodes CA Exp03"  
## [17] "CD8 Lymph nodes CAKR Exp02" "CD8 Lymph nodes CAKR Exp03"
## [19] "CD8 Lymph nodes WT Exp01"   "CD8 Lymph nodes WT Exp02"  
## [21] "CD8 Lymph nodes WT Exp03"   "CD8 Thymus CA Exp01"       
## [23] "CD8 Thymus CA Exp03"        "CD8 Thymus CAKR Exp02"     
## [25] "CD8 Thymus CAKR Exp03"      "CD8 Thymus WT Exp01"       
## [27] "CD8 Thymus WT Exp02"        "CD8 Thymus WT Exp03"       
## [29] "aaSeqCDR3"                  "allVHitsWithScore"         
## [31] "allDHitsWithScore"          "allJHitsWithScore"
prop.table.trb2_filter.cd8.ln <- prop.table.trb_filter[,19:25]
prop.table.trb2_filter.cd8.ln$sum <- rowSums(prop.table.trb2_filter.cd8.ln)

prop.table.trb2_filter.cd8.ln <- prop.table.trb2_filter.cd8.ln %>% filter(sum >0) %>% select(-sum)

res.pca.trb.cd8.ln <- prcomp(t(prop.table.trb2_filter.cd8.ln), scale = TRUE, center = T)

mdres.pca.trb.cd8.ln  <-  colnames(prop.table.trb_filter[,19:25])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
  
fviz_pca_ind(res.pca.trb.cd8.ln,
             col.ind = as.factor(mdres.pca.trb.cd8.ln$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.trb.cd8.ln$Organ),
                  color = as.factor(mdres.pca.trb.cd8.ln$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") 

#ggsave("final_fig/pca/cd8_trb_ln.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd8_trb_ln.svg", width = 2.6, height = 1.7)


# Thymus

prop.table.trb_filter %>% colnames
##  [1] "CD4 Lymph nodes CA Exp01"   "CD4 Lymph nodes CA Exp02"  
##  [3] "CD4 Lymph nodes CA Exp03"   "CD4 Lymph nodes CAKR Exp02"
##  [5] "CD4 Lymph nodes CAKR Exp03" "CD4 Lymph nodes WT Exp02"  
##  [7] "CD4 Lymph nodes WT Exp03"   "CD4 Thymus CA Exp01"       
##  [9] "CD4 Thymus CA Exp03"        "CD4 Thymus CAKR Exp01"     
## [11] "CD4 Thymus CAKR Exp02"      "CD4 Thymus CAKR Exp03"     
## [13] "CD4 Thymus WT Exp02"        "CD4 Thymus WT Exp03"       
## [15] "CD8 Lymph nodes CA Exp02"   "CD8 Lymph nodes CA Exp03"  
## [17] "CD8 Lymph nodes CAKR Exp02" "CD8 Lymph nodes CAKR Exp03"
## [19] "CD8 Lymph nodes WT Exp01"   "CD8 Lymph nodes WT Exp02"  
## [21] "CD8 Lymph nodes WT Exp03"   "CD8 Thymus CA Exp01"       
## [23] "CD8 Thymus CA Exp03"        "CD8 Thymus CAKR Exp02"     
## [25] "CD8 Thymus CAKR Exp03"      "CD8 Thymus WT Exp01"       
## [27] "CD8 Thymus WT Exp02"        "CD8 Thymus WT Exp03"       
## [29] "aaSeqCDR3"                  "allVHitsWithScore"         
## [31] "allDHitsWithScore"          "allJHitsWithScore"
prop.table.trb2_filter.cd8.thy <- prop.table.trb_filter[,26:32]
prop.table.trb2_filter.cd8.thy$sum <- rowSums(prop.table.trb2_filter.cd8.thy)
## Error in rowSums(prop.table.trb2_filter.cd8.thy): 'x' must be numeric
prop.table.trb2_filter.cd8.thy <- prop.table.trb2_filter.cd8.thy %>% filter(sum >0) %>% select(-sum)
## Error in `filter()`:
## ! Problem while computing `..1 = sum > 0`.
## Caused by error in `sum > 0`:
## ! comparison (6) is possible only for atomic and list types
res.pca.trb.cd8.thy <- prcomp(t(prop.table.trb2_filter.cd8.thy), scale = TRUE, center = T)
## Error in colMeans(x, na.rm = TRUE): 'x' must be numeric
mdres.pca.trb.cd8.thy  <-  colnames(prop.table.trb_filter[,26:32])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
 
fviz_pca_ind(res.pca.trb.cd8.thy,
             col.ind = as.factor(mdres.pca.trb.cd8.thy$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.trb.cd8.thy$Organ),
                  color = as.factor(mdres.pca.trb.cd8.thy$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") 
## Error in .get_facto_class(X): object 'res.pca.trb.cd8.thy' not found
#ggsave("final_fig/pca/cd8_trb_thy.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd8_trb_thy.svg", width = 2.6, height = 1.7)

Diversity index

immdata_tra_wonkt <- repFilter(immdata_tra, "by.clonotype",
  list(V.name = exclude("TRAV11"), J.name = exclude("TRAJ18")),
  .match = "substring")

## ALL TRA
repDiversity(immdata_tra_wonkt$data) %>% vis(.by = c( "Mouse_strain"), .meta = immdata_tra$meta, 
  .signif.label.size = 0) + ggtitle("TRA without NKT") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70")) + ggtitle("all TRA", subtitle = "")

#ggsave("./final_fig/diversity/chao_tra_wonkt_sign.png", width = 8, height = 13, units = "cm")
#ggsave("./final_fig/diversity/chao_tra_wonkt_sign.svg", width = 8, height = 13, units = "cm")


## ALL TRB
repDiversity(immdata_trb$data) %>% vis(.by = c( "Mouse_strain"), .meta = immdata_trb$meta, 
  .signif.label.size = 0) + ggtitle("TRB without NKT") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70")) + ggtitle("all TRB", subtitle = "")

#ggsave("./final_fig/diversity/chao_trb_sign.png", width = 8, height = 13, units = "cm")
#ggsave("./final_fig/diversity/chao_trb_sign.svg", width = 8, height = 13, units = "cm")


## Chao index
repDiversity(immdata_tra_wonkt$data) %>% vis(.by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_tra$meta, .test = F,
  .signif.label.size = 0) + ggtitle("TRA")

#ggsave("./plots/chao_tra.png", width = 15, height = 11, units = "cm")
#ggsave("./plots/chao_tra.svg", width = 15, height = 11, units = "cm")

repDiversity(immdata_trb$data) %>% vis(.by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_trb$meta, .test = F,
  .signif.label.size = 0) + ggtitle("TRB", subtitle = "")

#ggsave("./plots/chao_trb.png", width = 15, height = 11, units = "cm")
#ggsave("./plots/chao_trb.svg", width = 15, height = 11, units = "cm")
chao_tra <- as.data.frame(repDiversity(immdata_tra_wonkt$data))
chao_trb <- as.data.frame(repDiversity(immdata_trb$data))
chao_tra2 <- chao_tra %>% 
  rownames_to_column("num_id") %>% 
mutate(num_id = str_replace_all(string = num_id, c("new_lib_24" = "lib24_S24_L001",  "new_lib_25" = "lib25_S25_L001"))) %>% 
  separate(num_id, into = c("num_id",NA,NA), sep = "_") %>% 
  mutate(num_id = str_replace(num_id, "lib","")) %>% 
  mutate_at(vars("num_id"), as.numeric) 
  

levels_cd4 <- c("CD4 Thymus WT", "CD4 Thymus CA", "CD4 Thymus CAKR", "CD4 Lymph nodes WT", "CD4 Lymph nodes CA", "CD4 Lymph nodes CAKR")

# Attach metadata sent for sequencing

chao_tra2 <- left_join(chao_tra2, md) %>% select(value = Estimator, Cell_type, Organ, Mouse_strain) %>%
   mutate(sample_type = paste(Cell_type, Organ, Mouse_strain)) 
## Error in `left_join()`:
## ! `by` must be supplied when `x` and `y` have no common variables.
## ℹ use by = character()` to perform a cross-join.
chao_tra2 %>% filter(Cell_type == "CD4") %>% 
  ggplot(aes(y = value, x = factor(sample_type, levels = levels_cd4))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Chao index estimate") + 
  ggtitle("CD4 TRA") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
## Error in `filter()`:
## ! Problem while computing `..1 = Cell_type == "CD4"`.
## Caused by error in `mask$eval_all_filter()`:
## ! object 'Cell_type' not found
#ggsave("./final_fig/diversity/chao_cd4_tra.png", width = 8, height = 10, units = "cm")
#ggsave("./final_fig/diversity/chao_cd4_tra.eps", width = 8, height = 10, units = "cm")

levels_cd8 <- c("CD8 Thymus WT", "CD8 Thymus CA", "CD8 Thymus CAKR", "CD8 Lymph nodes WT", "CD8 Lymph nodes CA", "CD8 Lymph nodes CAKR")

chao_tra2 %>% filter(Cell_type == "CD8") %>% 
  ggplot(aes(y = value, x = factor(sample_type, levels = levels_cd8))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Chao index estimate") + 
  ggtitle("CD8 TRA") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
## Error in `filter()`:
## ! Problem while computing `..1 = Cell_type == "CD8"`.
## Caused by error in `mask$eval_all_filter()`:
## ! object 'Cell_type' not found
#ggsave("./final_fig/diversity/chao_cd8_tra.png", width = 8, height = 10, units = "cm")
#ggsave("./final_fig/diversity/chao_cd8_tra.eps", width = 8, height = 10, units = "cm")
chao_trb2 <- chao_trb %>% 
  rownames_to_column("num_id") %>% mutate(num_id = str_replace_all(string = num_id, c("new_lib_24" = "lib24_S24_L001",  "new_lib_25" = "lib25_S25_L001"))) %>% 
  separate(num_id, into = c("num_id",NA,NA), sep = "_") %>% 
  mutate(num_id = str_replace(num_id, "lib",""))  %>% 
  mutate_at(vars("num_id"), as.numeric) 
  

levels_cd4 <- c("CD4 Thymus WT", "CD4 Thymus CA", "CD4 Thymus CAKR", "CD4 Lymph nodes WT", "CD4 Lymph nodes CA", "CD4 Lymph nodes CAKR")

chao_trb2 <- left_join(chao_trb2, md) %>% select(value = Estimator, Cell_type, Organ, Mouse_strain) %>%
   mutate(sample_type = paste(Cell_type, Organ, Mouse_strain)) 
## Error in `left_join()`:
## ! `by` must be supplied when `x` and `y` have no common variables.
## ℹ use by = character()` to perform a cross-join.
chao_trb2 %>% filter(Cell_type == "CD4") %>% 
  ggplot(aes(y = value, x = factor(sample_type, levels = levels_cd4))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Chao index estimate") + 
  ggtitle("CD4 TRB") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,125500)) + xlab("")
## Error in `filter()`:
## ! Problem while computing `..1 = Cell_type == "CD4"`.
## Caused by error in `mask$eval_all_filter()`:
## ! object 'Cell_type' not found
#ggsave("./final_fig/diversity/chao_cd4_trb.png", width = 8, height = 10, units = "cm")
#ggsave("./final_fig/diversity/chao_cd4_trb.eps", width = 8, height = 10, units = "cm")

levels_cd8 <- c("CD8 Thymus WT", "CD8 Thymus CA", "CD8 Thymus CAKR", "CD8 Lymph nodes WT", "CD8 Lymph nodes CA", "CD8 Lymph nodes CAKR")


chao_trb2 %>% filter(Cell_type == "CD8") %>% 
  ggplot(aes(y = value, x = factor(sample_type, levels = levels_cd8))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Chao index estimate") + 
  ggtitle("CD8 TRB") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,125500)) + xlab("")
## Error in `filter()`:
## ! Problem while computing `..1 = Cell_type == "CD8"`.
## Caused by error in `mask$eval_all_filter()`:
## ! object 'Cell_type' not found
#ggsave("./final_fig/diversity/chao_cd8_trb.png", width = 8, height = 10, units = "cm")
#ggsave("./final_fig/diversity/chao_cd8_trb.eps", width = 8, height = 10, units = "cm")
chao_trb2 %>% 
  ggplot(aes(y = value, x = factor(Mouse_strain, levels = c("WT","CA","CAKR")))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.1), size = 2, aes(color = Mouse_strain)) +
ylab("Chao index estimate") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
## Error in factor(Mouse_strain, levels = c("WT", "CA", "CAKR")): object 'Mouse_strain' not found
#ggsave("./final_fig/diversity/chao_all_trb.png", width = 8, height = 6, units = "cm")
#ggsave("./final_fig/diversity/chao_all_trb.eps", width = 8, height = 6, units = "cm")


chao_tra2 %>% 
  ggplot(aes(y = value, x = factor(Mouse_strain, levels = c("WT","CA","CAKR")))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.1), size = 2, aes(color = Mouse_strain)) +
ylab("Chao index estimate") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
## Error in factor(Mouse_strain, levels = c("WT", "CA", "CAKR")): object 'Mouse_strain' not found
#ggsave("./final_fig/diversity/chao_all_tra.png", width = 8, height = 6, units = "cm")
#ggsave("./final_fig/diversity/chao_all_tra.eps", width = 8, height = 6, units = "cm")

Top20 WT LN sequences

topclones <- prop.table.trb_filter %>% mutate(topclones_ln_cd4 = (`CD4 Lymph nodes WT Exp02`+
                                                                       `CD4 Lymph nodes WT Exp03`)/2,
                                               topclones_ln_cd8 = (`CD8 Lymph nodes WT Exp01`+
                                                                       `CD8 Lymph nodes WT Exp02`+
                                                                       `CD8 Lymph nodes WT Exp03`)/3)

cd4_topclones_20sequences_trb <- pull(topclones %>% slice_max(order_by = topclones_ln_cd4, n = 20), aaSeqCDR3)
cd8_topclones_20sequences_trb <- pull(topclones %>% slice_max(order_by = topclones_ln_cd8, n = 20), aaSeqCDR3)
topclones <- prop.table.tra_filter %>% mutate(topclones_ln_cd4 = (`CD4 Lymph nodes WT Exp02`+
                                                                       `CD4 Lymph nodes WT Exp03`)/2,
                                               topclones_ln_cd8 = (`CD8 Lymph nodes WT Exp01`+
                                                                       `CD8 Lymph nodes WT Exp02`+
                                                                       `CD8 Lymph nodes WT Exp03`)/3)

cd4_topclones_20sequences_tra <- pull(topclones %>% slice_max(order_by = topclones_ln_cd4, n = 20), aaSeqCDR3)
cd8_topclones_20sequences_tra <- pull(topclones %>% slice_max(order_by = topclones_ln_cd8, n = 20), aaSeqCDR3)

Summary plot

CD4 TRA

cd4_topclones_rep_pct <- prop.table.tra2 %>% filter(aaSeqCDR3 %in% cd4_topclones_20sequences_tra) %>% select(aaSeqCDR3, starts_with("CD4")) %>% select(1,14,15,9:13,7,8,2:6) 

cd4_topclones_rep_pct2 <- as.data.frame(colSums(cd4_topclones_rep_pct[,2:15]))
colnames(cd4_topclones_rep_pct2) <- "value"

cd4_topclones_rep_pct2 <- cd4_topclones_rep_pct2 %>% 
  rownames_to_column("sample") %>% 
  mutate(sample = stringr::str_replace_all(sample, pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp")) %>% 
  mutate(Cell_type_organ = factor(paste(Organ, Strain), levels = c("Thymus WT","Thymus CA", "Thymus CAKR",
                                                                   "LN WT","LN CA", "LN CAKR" )))

cd4_topclones_rep_pct2
##    Cell_type  Organ Strain   Exp      value Cell_type_organ
## 1        CD4 Thymus     WT Exp02 0.02719600       Thymus WT
## 2        CD4 Thymus     WT Exp03 0.02768635       Thymus WT
## 3        CD4 Thymus     CA Exp01 0.02961628       Thymus CA
## 4        CD4 Thymus     CA Exp03 0.02546552       Thymus CA
## 5        CD4 Thymus   CAKR Exp01 0.02006336     Thymus CAKR
## 6        CD4 Thymus   CAKR Exp02 0.02571295     Thymus CAKR
## 7        CD4 Thymus   CAKR Exp03 0.02394512     Thymus CAKR
## 8        CD4     LN     WT Exp02 0.03291915           LN WT
## 9        CD4     LN     WT Exp03 0.03092514           LN WT
## 10       CD4     LN     CA Exp01 0.03194544           LN CA
## 11       CD4     LN     CA Exp02 0.03571429           LN CA
## 12       CD4     LN     CA Exp03 0.03472770           LN CA
## 13       CD4     LN   CAKR Exp02 0.03056850         LN CAKR
## 14       CD4     LN   CAKR Exp03 0.02854483         LN CAKR
cd4_topclones_rep_pct2 %>% 
  filter(Organ == "LN") %>% 
  ggplot(aes(y = value*100, x = factor(Cell_type_organ))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Strain)) + 
  ylab("% of repertoire") + 
  ggtitle("CD4 TRA") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")

#ggsave("./final_fig/20topseqs/20topseqs_pct_tra_cd4.png", width = 5, height = 5.7, units = "cm")
#ggsave("./final_fig/20topseqs/20topseqs_pct_tra_cd4.eps", width = 5, height = 5.7, units = "cm")

CD4 TRB

cd4_topclones_rep_pct <- prop.table.trb2 %>% filter(aaSeqCDR3 %in% cd4_topclones_20sequences_trb) %>% select(aaSeqCDR3, starts_with("CD4")) %>% select(1,14,15,9:13,7,8,2:6) 

cd4_topclones_rep_pct2 <- as.data.frame(colSums(cd4_topclones_rep_pct[,2:15]))
colnames(cd4_topclones_rep_pct2) <- "value"

cd4_topclones_rep_pct2 <- cd4_topclones_rep_pct2 %>% 
  rownames_to_column("sample") %>% 
  mutate(sample = stringr::str_replace_all(sample, pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp")) %>% 
  mutate(Cell_type_organ = factor(paste(Organ, Strain), levels = c("Thymus WT","Thymus CA", "Thymus CAKR",
                                                                   "LN WT","LN CA", "LN CAKR" )))

cd4_topclones_rep_pct2
##    Cell_type  Organ Strain   Exp       value Cell_type_organ
## 1        CD4 Thymus     WT Exp02 0.002983516       Thymus WT
## 2        CD4 Thymus     WT Exp03 0.003173193       Thymus WT
## 3        CD4 Thymus     CA Exp01 0.002918556       Thymus CA
## 4        CD4 Thymus     CA Exp03 0.002984742       Thymus CA
## 5        CD4 Thymus   CAKR Exp01 0.002182929     Thymus CAKR
## 6        CD4 Thymus   CAKR Exp02 0.001994960     Thymus CAKR
## 7        CD4 Thymus   CAKR Exp03 0.002340585     Thymus CAKR
## 8        CD4     LN     WT Exp02 0.007174320           LN WT
## 9        CD4     LN     WT Exp03 0.004953674           LN WT
## 10       CD4     LN     CA Exp01 0.004225770           LN CA
## 11       CD4     LN     CA Exp02 0.004535615           LN CA
## 12       CD4     LN     CA Exp03 0.005438813           LN CA
## 13       CD4     LN   CAKR Exp02 0.003940359         LN CAKR
## 14       CD4     LN   CAKR Exp03 0.003868959         LN CAKR
cd4_topclones_rep_pct2 %>% 
  filter(Organ == "LN") %>% 
  ggplot(aes(y = value*100, x = factor(Cell_type_organ))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Strain)) + 
  ylab("% of repertoire") + 
  ggtitle("CD4 TRB") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")

#ggsave("./final_fig/20topseqs/20topseqs_pct_trb_cd4.png", width = 5.4, height = 5.7, units = "cm")
#ggsave("./final_fig/20topseqs/20topseqs_pct_trb_cd4.eps", width = 5.4, height = 5.7, units = "cm")

CD8 TRA

cd8_topclones_rep_pct <- prop.table.tra2 %>% filter(aaSeqCDR3 %in% cd8_topclones_20sequences_tra) %>% select(aaSeqCDR3, starts_with("CD8")) %>% select(1,13:15,9:12,6:8,2:5) 

cd8_topclones_rep_pct2 <- as.data.frame(colSums(cd8_topclones_rep_pct[,2:15]))
colnames(cd8_topclones_rep_pct2) <- "value"

cd8_topclones_rep_pct2 <- cd8_topclones_rep_pct2 %>% 
  rownames_to_column("sample") %>% 
  mutate(sample = stringr::str_replace_all(sample, pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp")) %>% 
  mutate(Cell_type_organ = factor(paste(Organ, Strain), levels = c("Thymus WT","Thymus CA", "Thymus CAKR",
                                                                   "LN WT","LN CA", "LN CAKR" )))

cd8_topclones_rep_pct2
##    Cell_type  Organ Strain   Exp      value Cell_type_organ
## 1        CD8 Thymus     WT Exp01 0.03400861       Thymus WT
## 2        CD8 Thymus     WT Exp02 0.02653846       Thymus WT
## 3        CD8 Thymus     WT Exp03 0.03137384       Thymus WT
## 4        CD8 Thymus     CA Exp01 0.02447936       Thymus CA
## 5        CD8 Thymus     CA Exp03 0.02870174       Thymus CA
## 6        CD8 Thymus   CAKR Exp02 0.03063281     Thymus CAKR
## 7        CD8 Thymus   CAKR Exp03 0.02280437     Thymus CAKR
## 8        CD8     LN     WT Exp01 0.03753115           LN WT
## 9        CD8     LN     WT Exp02 0.03764212           LN WT
## 10       CD8     LN     WT Exp03 0.03642344           LN WT
## 11       CD8     LN     CA Exp02 0.03592586           LN CA
## 12       CD8     LN     CA Exp03 0.03213014           LN CA
## 13       CD8     LN   CAKR Exp02 0.02946860         LN CAKR
## 14       CD8     LN   CAKR Exp03 0.03059997         LN CAKR
cd8_topclones_rep_pct2 %>% 
  filter(Organ == "LN") %>% 
  ggplot(aes(y = value*100, x = factor(Cell_type_organ))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Strain)) + 
  ylab("% of repertoire") + 
  ggtitle("CD8 TRA") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")

#ggsave("./final_fig/20topseqs/20topseqs_pct_tra_cd8.png", width = 5, height = 5.7, units = "cm")
#ggsave("./final_fig/20topseqs/20topseqs_pct_tra_cd8.eps", width = 5, height = 5.7, units = "cm")

CD8 TRB

cd8_topclones_rep_pct <- prop.table.trb2 %>% filter(aaSeqCDR3 %in% cd8_topclones_20sequences_trb) %>% select(aaSeqCDR3, starts_with("CD8")) %>% select(1,13:15,9:12,6:8,2:5) 

cd8_topclones_rep_pct2 <- as.data.frame(colSums(cd8_topclones_rep_pct[,2:15]))
colnames(cd8_topclones_rep_pct2) <- "value"

cd8_topclones_rep_pct2 <- cd8_topclones_rep_pct2 %>% 
  rownames_to_column("sample") %>% 
  mutate(sample = stringr::str_replace_all(sample, pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp")) %>% 
  mutate(Cell_type_organ = factor(paste(Organ, Strain), levels = c("Thymus WT","Thymus CA", "Thymus CAKR",
                                                                   "LN WT","LN CA", "LN CAKR" )))

cd8_topclones_rep_pct2
##    Cell_type  Organ Strain   Exp       value Cell_type_organ
## 1        CD8 Thymus     WT Exp01 0.004631440       Thymus WT
## 2        CD8 Thymus     WT Exp02 0.004479484       Thymus WT
## 3        CD8 Thymus     WT Exp03 0.004929262       Thymus WT
## 4        CD8 Thymus     CA Exp01 0.005472183       Thymus CA
## 5        CD8 Thymus     CA Exp03 0.004517467       Thymus CA
## 6        CD8 Thymus   CAKR Exp02 0.003983029     Thymus CAKR
## 7        CD8 Thymus   CAKR Exp03 0.005094131     Thymus CAKR
## 8        CD8     LN     WT Exp01 0.007470700           LN WT
## 9        CD8     LN     WT Exp02 0.006188663           LN WT
## 10       CD8     LN     WT Exp03 0.007226656           LN WT
## 11       CD8     LN     CA Exp02 0.006775749           LN CA
## 12       CD8     LN     CA Exp03 0.007135684           LN CA
## 13       CD8     LN   CAKR Exp02 0.007553967         LN CAKR
## 14       CD8     LN   CAKR Exp03 0.006905450         LN CAKR
cd8_topclones_rep_pct2 %>% 
  filter(Organ == "LN") %>% 
  ggplot(aes(y = value*100, x = factor(Cell_type_organ))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Strain)) + 
  ylab("% of repertoire") + 
  ggtitle("CD8 TRB") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")

#ggsave("./final_fig/20topseqs/20topseqs_pct_trb_cd8.png", width = 5.4, height = 5.7, units = "cm")
#ggsave("./final_fig/20topseqs/20topseqs_pct_trb_cd8.eps", width = 5.4, height = 5.7, units = "cm")

Heatmaps

TRA

CD4

# TRA
cd4_topclones_heatmap_matrix <- prop.table.tra2 %>% filter(aaSeqCDR3 %in% cd4_topclones_20sequences_tra) %>% select(aaSeqCDR3, starts_with("CD4")) %>% select(1,14,15,9:13,7,8,2:6)

cd4_topclones_heatmap_matrix2 <- as.matrix(cd4_topclones_heatmap_matrix[,2:15])
rownames(cd4_topclones_heatmap_matrix2) <- cd4_topclones_heatmap_matrix$aaSeqCDR3

cd4_topclones_heatmap_matrix2 <- cd4_topclones_heatmap_matrix2[match( cd4_topclones_20sequences_tra, rownames(cd4_topclones_heatmap_matrix2)),]

pheatmap(cd4_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F)

#ggsave(plot = pheatmap(cd4_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F), "final_fig/heatmap_top20_ordered/cd4_tra_noscale.svg")

CD8

# TRA

cd8_topclones_heatmap_matrix <- prop.table.tra2 %>% filter(aaSeqCDR3 %in% cd8_topclones_20sequences_tra) %>% select(aaSeqCDR3, starts_with("CD8")) %>% select(1,13:15,9:12,6:8,2:5)
  

cd8_topclones_heatmap_matrix2 <- as.matrix(cd8_topclones_heatmap_matrix[,2:15])
rownames(cd8_topclones_heatmap_matrix2) <- cd8_topclones_heatmap_matrix$aaSeqCDR3

cd8_topclones_heatmap_matrix2 <- cd8_topclones_heatmap_matrix2[match( cd8_topclones_20sequences_tra, rownames(cd8_topclones_heatmap_matrix2)),]

pheatmap(cd8_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F)

#ggsave(plot = pheatmap(cd8_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F), "final_fig/heatmap_top20_ordered/cd8_tra_noscale.svg")

without 1st sequence

### without 1st sequence

cd8_topclones_heatmap_matrix2 <- as.matrix(cd8_topclones_heatmap_matrix[2:20,2:15])
rownames(cd8_topclones_heatmap_matrix2) <- cd8_topclones_heatmap_matrix$aaSeqCDR3[2:20]

cd8_topclones_heatmap_matrix2 <- cd8_topclones_heatmap_matrix2[match( cd8_topclones_20sequences_tra[2:20], rownames(cd8_topclones_heatmap_matrix2)),]

pheatmap(cd8_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F)

#ggsave(plot = pheatmap(cd8_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F), "final_fig/heatmap_top20_ordered/cd8_tra_noscale_without.svg")

TRB

CD4

# trb

cd4_topclones_heatmap_matrix <- prop.table.trb2 %>% filter(aaSeqCDR3 %in% cd4_topclones_20sequences_trb) %>% select(aaSeqCDR3, starts_with("CD4")) %>% select(1,14,15,9:13,7,8,2:6)
  
cd4_topclones_heatmap_matrix2 <- as.matrix(cd4_topclones_heatmap_matrix[,2:15])
rownames(cd4_topclones_heatmap_matrix2) <- cd4_topclones_heatmap_matrix$aaSeqCDR3

cd4_topclones_heatmap_matrix2 <- cd4_topclones_heatmap_matrix2[match( cd4_topclones_20sequences_trb, rownames(cd4_topclones_heatmap_matrix2)),]

pheatmap(cd4_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F)

#ggsave(plot = pheatmap(cd4_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F), "final_fig/heatmap_top20_ordered/cd4_trb_noscale.svg")

CD8

# trb

cd8_topclones_heatmap_matrix <- prop.table.trb2 %>% filter(aaSeqCDR3 %in% cd8_topclones_20sequences_trb) %>% select(aaSeqCDR3, starts_with("CD8")) %>% select(1,13:15,9:12,6:8,2:5)
  
cd8_topclones_heatmap_matrix2 <- as.matrix(cd8_topclones_heatmap_matrix[,2:15])
rownames(cd8_topclones_heatmap_matrix2) <- cd8_topclones_heatmap_matrix$aaSeqCDR3

cd8_topclones_heatmap_matrix2 <- cd8_topclones_heatmap_matrix2[match( cd8_topclones_20sequences_trb, rownames(cd8_topclones_heatmap_matrix2)),]

pheatmap(cd8_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F)

# png("final_fig/heatmap_top20_ordered/cd8_trb_noscale.png", width = 400, height = 500)
# pheatmap(cd8_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F)
# dev.off()

#ggsave(plot = pheatmap(cd8_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F), "final_fig/heatmap_top20_ordered/cd8_trb_noscale.svg")

Repertoire overlap

binary_tra <- excel_count_table_tra %>% 
  mutate(nkt_trav11_traj18 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRAV11") | 
    (grepl(allJHitsWithScore, pattern = "TRAJ18"))),"yes","no")) %>% 
  filter(nkt_trav11_traj18 == "no") %>% 
  mutate_at(vars(starts_with("CD")), .funs = binary) 

# count_overlap <- function(df, column_number){
#   repertoire <- df %>% select(vars(column_number), aaSeqCDR3) %>% filter(vars(column_number)>0) %>% pull("aaSeqCDR3")
#   nrow_column <- length(repertoire)
#   intersect_rept <- length(intersect(repertoire, most_diverse_repertoire))
#   pct <- intersect_rept/length()
#   return(x)
# }

binary_tra_longer <- binary_tra %>% select(starts_with("CD"), aaSeqCDR3) %>% pivot_longer(!aaSeqCDR3, names_to = "num_id")

df_all4 <- data.frame("")

for(j in levels(factor(binary_tra_longer$num_id))){
  subset1 <- binary_tra_longer %>% filter(num_id == j & value>0) %>% pull("aaSeqCDR3")
  vector_overlap <- c()
  for(i in levels(factor(binary_tra_longer$num_id))){
    subset2 <- binary_tra_longer %>% filter(num_id == i & value>0) %>% pull("aaSeqCDR3")
    intersect_rep <- length(intersect(subset1, subset2))
    total <- length(subset2)
    vector_overlap <- c(vector_overlap,intersect_rep/total)
  }
  df <- as.data.frame(x = vector_overlap)
  colnames(df) <- j
  df
  df_all4 <- cbind(df_all4, df)
}

df_all4 <- df_all4[,2:29]
rownames(df_all4) <- colnames(df_all4)

#write.csv(df_all4, "rep_overlap_tra.csv")
df24 <- df_all4
df24[df24 == 1] <- 0

### plot dotplot of overlaps

df25 <- df24 %>% rownames_to_column("id") %>%  pivot_longer(-id)

g2 <- ggplot(df25, aes(id, factor(name, levels = rev(levels(factor(name)))))) + 
  geom_point(aes(size = value*100, colour = value*100)) + 
  theme_bw() 

g2 + scale_size_continuous(range=c(7,12)) +
  geom_text(aes(label = round(value*100, digits = 1))) + 
  scale_colour_gradient2(low = "lightskyblue", mid = "lightsteelblue2", high = "salmon") +
  theme(axis.text.x = element_text(angle = 90)) 

Rozdelene na CD4 a CD8

rep_overlap_tra <- read_csv("rep_overlap_tra.csv") %>% column_to_rownames("...1")

rep_overlap_tra_cd4 <- rep_overlap_tra[1:14,1:14]
rep_overlap_tra_cd8 <- rep_overlap_tra[15:28,15:28]

rep_overlap_tra_cd4 <- rep_overlap_tra_cd4[ rev(c(5, 4, 3, 2, 1, 7, 6, 12, 11, 10, 9, 8, 14, 13)),
                                            rev(c(5, 4, 3, 2, 1, 7, 6, 12, 11, 10, 9, 8, 14, 13))]

levels_names <- colnames(rep_overlap_tra_cd4)

df_overlap_tra_cd4 <- rep_overlap_tra_cd4
df_overlap_tra_cd4[df_overlap_tra_cd4 == 1] <- 0

### plot dotplot of overlaps

df_overlap_tra_cd4 <- df_overlap_tra_cd4 %>% rownames_to_column("id") %>%  pivot_longer(-id)

g2 <- ggplot(df_overlap_tra_cd4, aes(x = factor(name, levels = levels_names), y = factor(id, levels = rev(levels_names)))) + 
  geom_point(aes(size = value*100, colour = value*100)) + 
  theme_bw() 

g2 + scale_size_continuous(range=c(5,9)) +
  geom_text(aes(label = round(value*100, digits = 1)), size = 3) + 
  scale_colour_gradient2(low = "white", mid = "lightsteelblue", high = "salmon") +
  theme(axis.text.x = element_text(angle = 90)) + ggtheme()

#ggsave(file = "./final_fig/overlap_cd4_tra.png", width = 20, height = 16, units = "cm")
#ggsave(file = "./final_fig/overlap_cd4_tra.svg", width = 20, height = 16, units = "cm")
rep_overlap_tra_cd8 <- rep_overlap_tra_cd8[rev(c(12:14,8:11,5:7,1:4)),rev(c(12:14,8:11,5:7,1:4))]

levels_names <- rev(colnames(rep_overlap_tra_cd8))

df_overlap_tra_cd8 <- rep_overlap_tra_cd8
df_overlap_tra_cd8[df_overlap_tra_cd8 == 1] <- 0

### plot dotplot of overlaps

df_overlap_tra_cd8 <- df_overlap_tra_cd8 %>% rownames_to_column("id") %>%  pivot_longer(-id)

g2 <- ggplot(df_overlap_tra_cd8, aes(factor(name, levels = levels_names), factor(id, levels = rev(levels_names)))) + 
  geom_point(aes(size = value*100, colour = value*100)) + 
  theme_bw() 

g2 + scale_size_continuous(range=c(5,9)) +
  geom_text(aes(label = round(value*100, digits = 1)), size = 3) + 
  scale_colour_gradient2(low = "white", mid = "lightsteelblue", high = "salmon") +
  theme(axis.text.x = element_text(angle = 90)) + ggtheme()

#ggsave(file = "./final_fig/overlap_cd8_tra.png", width = 20, height = 16, units = "cm")
#ggsave(file = "./final_fig/overlap_cd8_tra.svg", width = 20, height = 16, units = "cm")

A la violin

violin_tra_cd4 <- rep_overlap_tra_cd4 %>% 
  rownames_to_column("sample1") %>% 
  pivot_longer(!sample1, names_to = "sample2") %>% 
  mutate(sample2 = stringr::str_replace_all(sample2, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  mutate(sample1 = stringr::str_replace_all(sample1, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  separate(sample1, into = c("Cell_type1","Organ1","Strain1","Exp1"), remove = F)  %>% 
  separate(sample2, into = c("Cell_type2","Organ2","Strain2","Exp2"), remove = F) %>% 
  filter(value<1)


violin_tra_cd8 <- rep_overlap_tra_cd8 %>% 
  rownames_to_column("sample1") %>% 
  pivot_longer(!sample1, names_to = "sample2") %>% 
  mutate(sample2 = stringr::str_replace_all(sample2, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  mutate(sample1 = stringr::str_replace_all(sample1, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  separate(sample1, into = c("Cell_type1","Organ1","Strain1","Exp1"), remove = F)  %>% 
  separate(sample2, into = c("Cell_type2","Organ2","Strain2","Exp2"), remove = F) %>% 
  filter(value<1)
violin_tra_cd4 %>% filter(Organ1 == "Thymus" & Organ2 == "Thymus") %>% 
  mutate(organ_strain1 = paste(Organ1, Strain1), 
                          organ_strain2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strain1, organ_strain2, sep = " - ")) %>% 
  filter(comparison %in% c("Thymus WT - Thymus WT", "Thymus CA - Thymus WT", "Thymus CAKR - Thymus WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD4 TRA Thymus") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.35)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd4_tra_thy.png", width = 9.5, height = 9.5,  units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd4_tra_thy.eps", width = 9.5, height = 9.5,  units = "cm")
violin_tra_cd4 %>% filter(Organ1 == "LN" & Organ2 == "LN") %>% 
  mutate(organ_strain1 = paste(Organ1, Strain1), 
                          organ_strain2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strain1, organ_strain2, sep = " - ")) %>% 
  filter(comparison %in% c("LN WT - LN WT", "LN CA - LN WT", "LN CAKR - LN WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD4 TRA LN") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.35)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd4_tra_LN.png", width = 8, height = 8,  units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd4_tra_LN.eps", width = 8, height = 8,  units = "cm")
violin_tra_cd8 %>% filter(Organ1 == "Thymus" & Organ2 == "Thymus") %>% 
  mutate(organ_strain1 = paste(Organ1, Strain1), 
                          organ_strain2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strain1, organ_strain2, sep = " - ")) %>% 
  filter(comparison %in% c("Thymus WT - Thymus WT", "Thymus CA - Thymus WT", "Thymus CAKR - Thymus WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD8 TRA Thymus") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.35)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd8_tra_thy.png", width = 9.5, height = 9.5,  units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd8_tra_thy.eps", width = 9.5, height = 9.5,  units = "cm")
violin_tra_cd8 %>% filter(Organ1 == "LN" & Organ2 == "LN") %>% 
  mutate(organ_strain1 = paste(Organ1, Strain1), 
                          organ_strain2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strain1, organ_strain2, sep = " - ")) %>% 
  filter(comparison %in% c("LN WT - LN WT", "LN CA - LN WT", "LN CAKR - LN WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD8 TRA LN") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.35)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd8_tra_LN.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd8_tra_LN.eps", width = 8, height = 8, units = "cm")

TRB

Vsechny vzorky se vsema

binary_trb <- excel_count_table_trb %>% 
  mutate_at(vars(starts_with("CD")), .funs = binary) 

binary_trb_longer <- binary_trb %>% select(starts_with("CD"), aaSeqCDR3) %>% pivot_longer(!aaSeqCDR3, names_to = "num_id")

df_all4 <- data.frame("")

for(j in levels(factor(binary_trb_longer$num_id))){
  subset1 <- binary_trb_longer %>% filter(num_id == j & value>0) %>% pull("aaSeqCDR3")
  vector_overlap <- c()
  for(i in levels(factor(binary_trb_longer$num_id))){
    subset2 <- binary_trb_longer %>% filter(num_id == i & value>0) %>% pull("aaSeqCDR3")
    intersect_rep <- length(intersect(subset1, subset2))
    total <- length(subset2)
    vector_overlap <- c(vector_overlap,intersect_rep/total)
  }
  df <- as.data.frame(x = vector_overlap)
  colnames(df) <- j
  df
  df_all4 <- cbind(df_all4, df)
}

df_all4 <- df_all4[,2:29]
rownames(df_all4) <- colnames(df_all4)

#write.csv(df_all4, "rep_overlap_trb.csv")
df24 <- df_all4
df24[df24 == 1] <- 0

### plot dotplot of overlaps

df25 <- df24 %>% rownames_to_column("id") %>%  pivot_longer(-id)

g2 <- ggplot(df25, aes(id, factor(name, levels = rev(levels(factor(name)))))) + 
  geom_point(aes(size = value*100, colour = value*100)) + 
  theme_bw() 

g2 + scale_size_continuous(range=c(7,12)) +
  geom_text(aes(label = round(value*100, digits = 1))) + 
  scale_colour_gradient2(low = "lightskyblue", mid = "lightsteelblue2", high = "salmon") +
  theme(axis.text.x = element_text(angle = 90)) 

Rozdelene na CD4 a CD8

rep_overlap_trb <- read_csv("rep_overlap_trb.csv") %>% column_to_rownames("...1")

rep_overlap_trb_cd4 <- rep_overlap_trb[1:14,1:14]
rep_overlap_trb_cd8 <- rep_overlap_trb[15:28,15:28]

rep_overlap_trb_cd4 <- rep_overlap_trb_cd4[rev(c(13,14,8:12,6,7,1:5)),rev(c(13,14,8:12,6,7,1:5))]

levels_names <- rev(colnames(rep_overlap_trb_cd4))


df_overlap_trb_cd4 <- rep_overlap_trb_cd4
df_overlap_trb_cd4[df_overlap_trb_cd4 == 1] <- 0

### plot dotplot of overlaps

df_overlap_trb_cd4 <- df_overlap_trb_cd4 %>% rownames_to_column("id") %>%  pivot_longer(-id)

g2 <- ggplot(df_overlap_trb_cd4, aes(factor(name, levels = levels_names), factor(id, levels = rev(levels_names)))) + 
  geom_point(aes(size = value*100, colour = value*100)) + 
  theme_bw() 

g2 + scale_size_continuous(range=c(5,9)) +
  geom_text(aes(label = round(value*100, digits = 1)), size = 3) + 
  scale_colour_gradient2(low = "white", mid = "lightsteelblue", high = "salmon") +
  theme(axis.text.x = element_text(angle = 90)) + ggtheme()

#ggsave(file = "./final_fig/overlap_cd4_trb.png", width = 20, height = 16, units = "cm")
#ggsave(file = "./final_fig/overlap_cd4_trb.svg", width = 20, height = 16, units = "cm")
rep_overlap_trb_cd8 <- rep_overlap_trb_cd8[rev(c(12:14,8:11,5:7,1:4)),rev(c(12:14,8:11,5:7,1:4))]

levels_names <- rev(colnames(rep_overlap_trb_cd8))

df_overlap_trb_cd8 <- rep_overlap_trb_cd8
df_overlap_trb_cd8[df_overlap_trb_cd8 == 1] <- 0

### plot dotplot of overlaps

df_overlap_trb_cd8 <- df_overlap_trb_cd8 %>% rownames_to_column("id") %>%  pivot_longer(-id)

g2 <- ggplot(df_overlap_trb_cd8, aes(factor(name, levels = levels_names), factor(id, levels = rev(levels_names)))) + 
  geom_point(aes(size = value*100, colour = value*100)) + 
  theme_bw() 

g2 + scale_size_continuous(range=c(5,9)) +
  geom_text(aes(label = round(value*100, digits = 1)), size = 3) + 
  scale_colour_gradient2(low = "white", mid = "lightsteelblue", high = "salmon") +
  theme(axis.text.x = element_text(angle = 90)) + ggtheme()

#ggsave(file = "./final_fig/overlap_cd8_trb.png", width = 20, height = 16, units = "cm")
#ggsave(file = "./final_fig/overlap_cd8_trb.svg", width = 20, height = 16, units = "cm")

A la violin

violin_trb_cd4 <- rep_overlap_trb_cd4 %>% 
  rownames_to_column("sample1") %>% 
  pivot_longer(!sample1, names_to = "sample2") %>% 
  mutate(sample2 = stringr::str_replace_all(sample2, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  mutate(sample1 = stringr::str_replace_all(sample1, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  separate(sample1, into = c("Cell_type1","Organ1","Strain1","Exp1"), remove = F)  %>% 
  separate(sample2, into = c("Cell_type2","Organ2","Strain2","Exp2"), remove = F) %>% 
  filter(value<1)

violin_trb_cd8 <- rep_overlap_trb_cd8 %>% 
  rownames_to_column("sample1") %>% 
  pivot_longer(!sample1, names_to = "sample2") %>% 
  mutate(sample2 = stringr::str_replace_all(sample2, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  mutate(sample1 = stringr::str_replace_all(sample1, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  separate(sample1, into = c("Cell_type1","Organ1","Strain1","Exp1"), remove = F)  %>% 
  separate(sample2, into = c("Cell_type2","Organ2","Strain2","Exp2"), remove = F) %>% 
  filter(value<1)
violin_trb_cd8 %>% filter(Organ1 == "Thymus" & Organ2 == "Thymus") %>% 
  mutate(organ_strbin1 = paste(Organ1, Strain1), 
                          organ_strbin2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strbin1, organ_strbin2, sep = " - ")) %>% 
  filter(comparison %in% c("Thymus WT - Thymus WT", "Thymus CA - Thymus WT", "Thymus CAKR - Thymus WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD8 TRB Thymus") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.25)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd8_trb_thy.png", width = 9.5, height = 9.5,  units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd8_trb_thy.eps", width = 9.5, height = 9.5,  units = "cm")
violin_trb_cd8 %>% filter(Organ1 == "LN" & Organ2 == "LN") %>% 
  mutate(organ_strbin1 = paste(Organ1, Strain1), 
                          organ_strbin2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strbin1, organ_strbin2, sep = " - ")) %>% 
  filter(comparison %in% c("LN WT - LN WT", "LN CA - LN WT", "LN CAKR - LN WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD8 TRB LN") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.25)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd8_trb_LN.png", width = 8, height = 8,  units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd8_trb_LN.eps", width = 8, height = 8,  units = "cm")
violin_trb_cd4 %>% filter(Organ1 == "Thymus" & Organ2 == "Thymus") %>% 
  mutate(organ_strbin1 = paste(Organ1, Strain1), 
                          organ_strbin2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strbin1, organ_strbin2, sep = " - ")) %>% 
  filter(comparison %in% c("Thymus WT - Thymus WT", "Thymus CA - Thymus WT", "Thymus CAKR - Thymus WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD4 TRB Thymus") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.20)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd4_trb_thy.png", width = 9.5, height = 9.5,  units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd4_trb_thy.eps", width = 9.5, height = 9.5,  units = "cm")
violin_trb_cd4 %>% filter(Organ1 == "LN" & Organ2 == "LN") %>% 
  mutate(organ_strbin1 = paste(Organ1, Strain1), 
                          organ_strbin2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strbin1, organ_strbin2, sep = " - ")) %>% 
  filter(comparison %in% c("LN WT - LN WT", "LN CA - LN WT", "LN CAKR - LN WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD4 TRB LN") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.20)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd4_trb_LN.png", width = 8, height = 8,  units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd4_trb_LN.eps", width = 8, height = 8,  units = "cm")

SessionInfo

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] SmartEDA_0.3.8    kableExtra_1.3.4  factoextra_1.0.7  pheatmap_1.0.12  
##  [5] immunarch_0.6.9   data.table_1.14.2 dtplyr_1.2.2      patchwork_1.1.2  
##  [9] cowplot_1.1.1     readxl_1.4.1      forcats_0.5.2     stringr_1.4.1    
## [13] dplyr_1.0.9       purrr_0.3.4       readr_2.1.2       tidyr_1.2.0      
## [17] tibble_3.1.8      ggplot2_3.3.6     tidyverse_1.3.1  
## 
## loaded via a namespace (and not attached):
##   [1] uuid_1.1-0          backports_1.4.1     circlize_0.4.15    
##   [4] fastmatch_1.1-3     systemfonts_1.0.4   plyr_1.8.7         
##   [7] igraph_1.3.4        digest_0.6.29       foreach_1.5.2      
##  [10] htmltools_0.5.3     ggalluvial_0.12.3   fansi_1.0.3        
##  [13] magrittr_2.0.3      cluster_2.1.4       doParallel_1.0.17  
##  [16] tzdb_0.3.0          modelr_0.1.9        vroom_1.5.7        
##  [19] svglite_2.1.0       lpSolve_5.6.15      rmdformats_1.0.4   
##  [22] colorspace_2.0-3    rvest_1.0.3         ggrepel_0.9.1      
##  [25] haven_2.5.1         xfun_0.31           crayon_1.5.1       
##  [28] jsonlite_1.8.0      phangorn_2.9.0      iterators_1.0.14   
##  [31] ape_5.6-2           glue_1.6.2          gtable_0.3.0       
##  [34] webshot_0.5.3       UpSetR_1.4.0        car_3.1-0          
##  [37] kernlab_0.9-31      shape_1.4.6         prabclus_2.3-2     
##  [40] DEoptimR_1.0-11     ISLR_1.4            abind_1.4-5        
##  [43] scales_1.2.1        DBI_1.1.3           GGally_2.1.2       
##  [46] rstatix_0.7.0       Rcpp_1.0.9          viridisLite_0.4.1  
##  [49] xtable_1.8-4        bit_4.0.4           mclust_5.4.10      
##  [52] stats4_4.2.1        sampling_2.9        httr_1.4.4         
##  [55] RColorBrewer_1.1-3  fpc_2.2-9           modeltools_0.2-23  
##  [58] ellipsis_0.3.2      farver_2.1.1        pkgconfig_2.0.3    
##  [61] reshape_0.8.9       flexmix_2.3-18      nnet_7.3-17        
##  [64] sass_0.4.2          ggseqlogo_0.1       dbplyr_2.2.1       
##  [67] utf8_1.2.2          labeling_0.4.2      tidyselect_1.1.2   
##  [70] rlang_1.0.4         reshape2_1.4.4      later_1.3.0        
##  [73] munsell_0.5.0       cellranger_1.1.0    tools_4.2.1        
##  [76] cachem_1.0.6        cli_3.3.0           generics_0.1.3     
##  [79] broom_1.0.0         evaluate_0.16       fastmap_1.1.0      
##  [82] yaml_2.3.5          bit64_4.0.5         knitr_1.39         
##  [85] fs_1.5.2            robustbase_0.95-0   nlme_3.1-159       
##  [88] mime_0.12           xml2_1.3.3          compiler_4.2.1     
##  [91] shinythemes_1.2.0   rstudioapi_0.14     ggsignif_0.6.3     
##  [94] reprex_2.0.2        bslib_0.4.0         stringi_1.7.8      
##  [97] highr_0.9           lattice_0.20-45     Matrix_1.4-1       
## [100] vctrs_0.4.1         stringdist_0.9.8    pillar_1.8.1       
## [103] lifecycle_1.0.1     jquerylib_0.1.4     GlobalOptions_0.1.2
## [106] httpuv_1.6.5        R6_2.5.1            bookdown_0.27      
## [109] promises_1.2.0.1    gridExtra_2.3       codetools_0.2-18   
## [112] MASS_7.3-58.1       assertthat_0.2.1    withr_2.5.0        
## [115] rlist_0.4.6.2       diptest_0.76-0      parallel_4.2.1     
## [118] hms_1.1.2           quadprog_1.5-8      grid_4.2.1         
## [121] class_7.3-20        rmarkdown_2.14      carData_3.0-5      
## [124] ggpubr_0.4.0        shiny_1.7.2         lubridate_1.8.0